©Copyright 2012 
Jacob T. Vanderplas 



Karhunen-Loeve Analysis for Weak Gravitational Lens 



Jacob T. Vanderplas 



A dissertation 
submitted in partial fulfillment of the 
requirements for the degree of 



Doctor of Philosophy 



University of Washington 
2012 

Reading Committee: 
Andrew Connolly, Chair 
Bhuvnesh Jain 
Andrew Becker 

Program Authorized to Offer Degree: 
Department of Astronomy 



University of Washington 



Abstract 

Karhunen-Loeve Analysis for Weak Gravitational Lensing 

Jacob T. Vanderplas 

Chair of the Supervisory Committee: 
Professor Andrew Connolly 
Department of Astronomy 

In the past decade, weak gravitational lensing has become an important tool in the study 
of the universe at the largest scale, giving insights into the distribution of dark matter, 
the expansion of the universe, and the nature of dark energy. This thesis research explores 
several applications of Karhunen-Loeve (KL) analysis to speed and improve the comparison 
of weak lensing shear catalogs to theory in order to constrain cosmological parameters in 
current and future lensing surveys. This work addresses three related aspects of weak lensing 
analysis: 



Three-dimensional Tomographic Mapping: (Based on work published in VanderPlas 



et al. 2011) We explore a new fast approach to three-dimensional mass mapping in 
weak lensing surveys. The KL approach uses a KL-based filtering of the shear signal 
to reconstruct mass structures on the line-of-sight, and provides a unified framework 
to evaluate the efficacy of linear reconstruction techniques. We find that the KL-based 
filtering leads to near-optimal angular resolution, and computation times which are 
faster than previous approaches. We also use the KL formalism to show that linear 
non-parametric reconstruction methods are fundamentally limited in their ability to 
resolve lens redshifts. 



Shear Peak Statistics with Incomplete Data (Based on work published in Vander- 
Plas et al. 2012 ) We explore the use of KL eigenmodes for interpolation across masked 



regions in observed shear maps. Mass mapping is an inherently non-local calculation, 
meaning gaps in the data can have a significant effect on the properties of the de- 
rived mass map. Our KL mapping procedure leads to improvements in the recovery 
of detailed statistics of peaks in the mass map, which holds promise of improved 
cosmological constraints based on such studies. 

Two-point parameter estimation with KL modes The power spectrum of the ob- 
served shear can yield powerful cosmological constraints. Incomplete survey sky cov- 
erage, however, can lead to mixing of power between Fourier modes, and obfuscate the 
cosmologically sensitive signal. We show that KL can be used to derive an alternate 
orthonormal basis for the problem which avoids mode-mixing and allows a convenient 
formalism for cosmological likelihood computations. Cosmological constraints derived 
using this method are shown to be competitive with those from the more conventional 
correlation function approach. We also discuss several aspects of the KL approach 
which will allow improved handling of correlated errors and redshift information in 
future surveys. 
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Chapter 1 

BRIEF INTRODUCTION TO COSMOLOGY 



By the late part of the 20th century, the standard model of Cosmology seemed to rest 
on firm foundations. The model, known as the Cold Dark Matter (CDM) model, consisted 
of a uniformly expanding universe, composed of baryonic matter, cold non-baryonic (dark) 
matter, and radiation, with space-time evolving according to the dynamics of Einstein's 
Theory of General Relativity. This dynamical description, as was realized by Alexander 
Friedmann, George Lemaitre, Howard Robertson, and Arthur Walker during the 1920s and 
1930s, predicts a dynamic universe, where space itself must expand or contract under the 
influence of the energy within it. The expansion of the universe, first observed via the 



characteristic redshifts of distant galaxies (Hubble, 1929), was thought to be slowing under 



the gravitational effect of the matter within it. 

Uniform expansion of space-time lent support to the notion that the early universe was 
filled with a hot, dense plasma from which the constituents of chemical elements formed. 



This theory of Big Bang Nucleosynthesis (BBN: Alpher et al. 1948) was, and remains, ex- 
tremely successful in explaining the relative abundance of chemical elements in the universe. 
Another important prediction within the BBN model is that a Cosmic Background Radi- 
ation should be present, due to the photons which free-streamed once the universe cooled 



enough for the gas within it to no longer be ionized (Alpher & Herman 1948). The CBR 



signal had been observed in detail (Smoot et al. 1992), resulting in a confirmation of the 



inflationary hypothesis and the standard model of cosmology. 

Nonetheless, the standard model had some cracks in its foundation: cosmological probes 
and studies of galaxy clusters yielded widely discrepant estimates of the matter content in 
the universe. The implied age of the universe in the CDM model was younger than the age 
inferred for the oldest observed globular clusters and white dwarfs. Something needed to 
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change. 



Observations of type la supernovae by Riess et al. (1998) and Perlmutter et al. (1999) 
offered a solution: these showed that distant supernovae were fainter than expected in a 
standard CDM model. This implied that the expansion of the universe was accelerating, 
due to a cosmological component dubbed dark energy. This dark energy has an effective 
negative pressure, which overcomes the gravitational attraction of matter and causes the 
expansion of space to accelerate. This additional energy component in the universe solved 
many of the problems posed by CDM, and is now part of the standard A-CDM cosmological 
model (A refers to the cosmological constant first proposed by Einstein). 

Led by these supernova results, as well as new observations of the cosmic microwave 



background (CMB; Spergel et al. 2003), the baryon acoustic oscillations (BAO; Eisenstein 



et al. 2005), and other observational campaigns, the last 15 years has seen a surge in 



precision cosmological measurements. The dark energy postulated to explain supernova 
distances is now known to make up over 70% of the energy density of the universe, and 
have an equation of state consistent with it being due to vacuum energy or a cosmological 



constant (Kessler et al. 2009, Komatsu et al. 2011). Future surveys in many diverse areas 
of astronomy are seeking to place even tighter constraints, allowing greater insight into the 
nature and evolution of dark energy. 

With such a wide and diverse field as Cosmology, we can't hope to offer a complete 
introduction of the relevant theory in this work. For a more complete discussion, there are 
several very well-written books available; much of the material discussed below is taken 



Peebles 


1993 


Peacock 


1999 



Ryden 


2003 


Longair 


2008) 



background of the physical study of cosmology. We will begin with a discussion of the 
FLRW metric (named for Friedmann, Lemaitre, Robertson, and Walker) which describes 
the geometry of space-time. Next we'll move on to define the Friedmann Equations, which 
condense the field equations of Einstein's General Relativity to the basic pieces needed to 
describe the dynamics of a globally homogeneous and isotropic universe. We will then briefly 
discuss the relevant theory behind gravitational structure formation within this model. 
This paves the way to relate theory to data, using observations including cluster counts, 
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correlation functions, and Fourier power spectra. Finally, we will develop the equations 
describing gravitational lensing in the weak limit, and show how weak lensing observations 
can be used to gain insight into the parameters of our cosmological model. Throughout, 
we'll point out the relevant observational work which supports and constrains these theories. 



1.1 FLRW Metric 

The physical study of cosmology in its classical form is based on the fundamental assump- 
tion of symmetry: that the universe on the largest scales is isotropic. Homogeneity is an 
expression of translational symmetry: the appearance of the universe does not depend on 
the location of the observer. Isotropy is an expression of rotational symmetry: the appear- 
ance of the universe does not change with respect to the orientation of the observer. These 
assumptions are clearly incorrect at small scales - our galaxy has a much higher density of 
stars in the central bulge than in the outer halo, for example - but they appear to hold at 
the largest scales. At distance scales larger than the size of typical superclusters (about 50 
Mpc or more), the distribution of quasars and galaxies reflect the nearly homogeneous and 



isotropic nature of large scale structure (Yadav et al. 2005, Sarkar et al. 2009). More im 



portantly, the Cosmic Microwave Background appears homogeneous and isotropic to within 
one part in 10 5 , giving evidence that our assumptions of homogeneity and isotropy are 
well-founded for the universe as a whole (For an interesting discussion of the limits of this 



approach, however, see Maartens, 2011). 



The most general metric for a homogeneous and isotropic space-time is due to Howard 
Robertson and Arthur Walker, who showed that the space-time distance ds in spherical 
coordinates is given by 



ds 2 = -c 2 dt 2 + a{t) 2 [dr 2 + S 2 K (r)d$ 2 



(1.1) 



where t is the time coordinate, r and $ are comoving spherical coordinates, a(t) describes 
the distance scale (which may be an arbitrary function of t), and S 2 (r) is the curvature 
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term. The curvature term depends on the curvature, k, which may be either +1, —1, or 0: 



S K (r) 



R sin(r/R) k = +1 

r k = (1.2) 

R sinh(r/i?) k = — 1 

where R is the radius of curvature today. Often, the curvature sign k and radius R are 
compactly expressed in a single curvature parameter k, such that k = k/\k\ and R = \k\~ L / 2 . 

Robertson and Walker derived the above metric from purely geometric arguments. An 
interesting aspect of this metric is the scale factor a(t). A general homogeneous and isotropic 
universe is not necessarily static: it can be expanding or contracting with time. The detailed 
nature of this expansion cannot be derived from purely geometric means: the description 
of the dynamics of cosmic expansion comes from the field equations of Einstein's theory of 
General Relativity. 

1.2 The Friedmann Equations 



The Robertson- Walker metric (eq. 1.1) is a purely geometric result, where the scale factor 
a(t) is arbitrary and unspecified. Friedmann and Lemaitre had earlier independently derived 
this expression from Einstein's field equations, with the addition of certain dynamical con- 
straints on the scale factor. For this reason, the Robertson- Walker metric is often referred to 
as the Friedmann-Robertson- Walker metric or the Friedmann-Lemaitre-Robertson- Walker 
(FLRW) metric. The general relativistic constraints on the scale factor a(t) are compactly 
expressed by the Friedmann equations-^] 

d\ 2 8vrG A kc 2 . , 

e +o -72152 (1-3) 



3c 2 3 a 2 R 2 



a AnG , OTV . A ._, 

u = -^ e + 3P ^T (L4) 

where G is the gravitational constant, and c is the speed of light. The scale factor a is 
understood to be a function of time, with the dots representing derivatives with respect to 
time. By convention, the scale factor at the present day is chosen to be ct(io) = 1. e and P 



1 For a derivation of the Friedmann equations from the field equations of general relativity, refer to Peebles 
(19931 
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are the energy density and pressure of the mass-energy in the universe, and A represents the 



cosmological constant. Equations 1.3 and 1.4 are the first and second Friedmann equations, 



respectively. The third Friedmann equation can be easily derived from the first two: 

e = -3 a (e + P). (1.5) 
a 

This expression is equivalent to the first law of thermodynamics expressed for the universe 
whole. 

1.2.1 Time Dilation and Redshift 

General Relativity tells us that light always travels along null geodesies, that is, the space 
time interval in eqn. |1.1| satisfies ds = 0. For a light beam with no angular deflection dQ, 
this gives 

dr = -^-dt. (1.6) 
o(t) V ' 

If a beam of light is emitted at time t e and travels a comoving distance r, the time t Q that 

the light is observed can be found by solving 



r 



[ ° -£r* (1.7) 
Jt e a(t) 



If a second photon is emitted a short time later at time t e + At e , and arrives at time t + At , 
this gives 

"t + At 

t e +At e a{t) 

to c cAt cAt e 

^7^ + ——-——, (1.8) 
, a{t) a(t ) a(t e ) 

where we have used a first-order approximation. Equating these two expressions gives for 
small At: 

At = At e ^\. (1.9) 
a(t e ) 

In an expanding universe, the observed time interval is longer than the time interval in the 
emitted frame. This time dilation is a general feature of space-time governed by Einstein's 
field equations. 
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The time dilation has an observable effect on emitted light: if an atom emits light with 
a period P e = At e = A e /c, then the observed wavelength A Q and the emitted wavelength A e 
are related by 

A = A e °M. (1.10) 
a(t e ) 

The wavelength of light is lengthened due to the expansion of space. For historical reasons, 
this expansion is generally parametrized using the redshift: 



Because we define a(t ) = 1, we have 



1 + *=^- 
a(t e ) 



Thus the redshift of a light source gives us a direct measurement of the scale factor at 
the time that photon was emitted. As such, it can be substituted for a as the dependent 
variable in the above equations with a suitable change-of- variables; we will switch between 
these two conventions depending on which is convenient. 

1.2.2 Equation of State 

The Friedmann equations can be further simplified by relating the pressure P and energy 
density e in terms of a linear equation of state parameter 

w = P/e. (1.13) 



Using this, the solution of eqn. |1.5| gives 

e = £o a -3(i+™) (1.14) 

for w constant in time. Here £o = s(to) is the energy density today, and we have used 
the standard convention a(to) = 1. Given this parametrization, we can now separate the 



various contributions to the mass-energy of the universe and re- write eqn. 1.3 in terms of 
the equation of state for each: 

' w 
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where e Wj o is the energy density of each species at present. The various possible contributions 
are: 



Vacuum energy /Cosmological Constant: The vacuum energy or cosmological constant 
A has energy density that does not change with time. So by Equation 1.5 P = —e 
and w = — 1. 



General Quintessence: Quintessence is defined as any sort of matter or energy field that 
can balance the gravitational attraction, leading to accelerated expansion. By Equa- 
tion 



1.4, d/a > only if w < —1/3. We see that the cosmological constant is a form 



of quintessence. 



Curvature: Though it may seem strange to think about the curvature of space as having 
an energy density, in General Relativity the curvature is, in some sense, a stand-in 



for gravitational potential energy. Comparing eqns. 1.3 and 1.15 the dependence of 
the curvature term on scale factor k oc a~ 2 means it has an effective equation of state 
parameter w = —1/3. This makes it clear why curvature does not appear in the 
second Friedmann equation (eqn. 1.4): for w = —1/3, e + 3P = 0, and the presence 
of curvature cannot lead to a change in the expansion rate. 



Non-relativistic matter: Non-relativistic matter (often known as cold matter) has kinetic 
energy much less than its rest mass; in other words P ~ kT <C e. This corresponds 
to w <C 1, and we will often approximate this as simply w = 0. 



Relativistic Matter: Relativistic matter (known as warm matter or hot matter) has en- 
ergy given by E 2 = p 2 c 2 +m 2 c 4 , where p is the total momentum and m is the rest-mass. 
If pc <^ mc 2 , we have the non-relativistic case above, and find w — > 0. If pc ^> mc 2 , 
then in analogy to the radiation case discussed below, we find w — > 1/3. For general 
relativistic matter, this leads to < w < 1/3, with the exact value dependent on the 
energy density. 
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Radiation: Radiation has energy per particle proportional to the momentum times the 
speed of light. From basic electrodynamics, one can show that for an ideal photon 
gas, each spatial degree of freedom contributes equally to the energy, so that the 
pressure is P = dp/ dt = e/3. So relativistic mass-energy has w = 1/3. 



1.2.3 Evolving Equation of State 



In Equation 1.14 we show the solution of eqn. [L5j for constant w. Another possibility (es- 
pecially applicable for general quintessence models) is that the equation of state parameter 
w evolves with time. 

1+™«K ,1 



e(a) = Eq exp 



-da' 



(1.16) 



Many parametrizations of the form of w(a) have been proposed. Perhaps the simplest is a 
model which is simply linear in the redshift z, i.e. 



w(z) PS UIq + W\Z. 



(1.17) 



Though simple, this parametrization is encumbered by the fact that z changes very quickly 
with time, especially in the early universe. A better choice is the CPL parametrization 



(Chevallier & Polarski, 2001 Linder, 2003), which is of the form 



l + z' 



or equivalently 



w(a) Riwjl w a (l - a) 



(1.18) 



(1.19) 



which is linear in the scale factor rather than the redshift. Adopting this parametrization 
gives 

(1.20) 



e(a) = eoa -3(i+«o+^) e -3«;„(i-a)_ 



One of the main goals of future cosmological surveys is to put meaningful constraints on 
the time-evolution of dark energy, often by placing constraints on w\ or w a . This will be 
discussed further in later sections. 
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1.2.4 Hubble Parameter 



The first Friedmann equation (eqn. 1.3) is commonly expressed in terms of dimensionless 



parameters via the generalization in eqn. 1.15 If we define the Hubble parameter 



H 



and let Hq be the value of the Hubble parameter today, then eqn. 1.15 becomes 



\H 



8vrG 
31^2 



E 



-3(1+™) 



(1.21) 



(1.22) 



The constant in front of the sum has dimensions of inverse energy density: this motivates 
the definition of the critical density 

Pc _ 3H 2 c 2 



8ttG 



(1.23) 



where, to be explicit, both the critical densities e c , p c , and Hubble parameter H are functions 
of time. With this definition, and defining the dimensionless density parameter 



n w (t) = e w (t)/e c (t) 



the Friedmann equation can be compactly expressed 

2 



H 

Ho 



o a 



-3(l+w) 



(1.24) 



(1.25) 



where the subscript indicates the value at present. Alternatively, we can express the 
Friedmann equation as simply 

Y J tt w (t) = l- (1-26) 

w 

Notice that if the sum of all components £l w with the exception of f2 K is unity, then we must 



have curvature Q K = 0. This shows the meaning of the critical density (eqn. 1.23): if the 



energy density in the universe satisfies e > e c (i.e. f2 > 1), then k = +1 and the universe 
is spatially closed. If e < e c (i.e. £1 < 1), then k = — 1 and the universe is spatially open. 
If the energy density is exactly equal to the critical density, then the curvature k = and 
the universe is flat. Constraints from the Cosmic Microwave Background show that our 



universe is flat to a very high precision (Komatsu et al. 2011). In addition, inflationary 
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perturbation analysis shows that if the Universe is close to flat, it probably is flat. For this 
reason, in many of the below derivations we will assume for simplicity that f2 K = and 
consequently S K (r) = r. 

For simplicity, we can limit our consideration to the principal contributors to the density 
of the universe: dark energy matter (f2jvf)> radiation (Or), and curvature (fi K ). Ne- 

glecting other components gives the familiar dimensionless form of the Friedmann Equation: 

(JD = n M,o a~ 3 + tt Rfi a~ 4 + n Kfi a~ 2 + fl A (1-27) 

1.3 Cosmological Distance Measures 



The FLRW metric of [1.1 and the Friedmann equations of §1.2| lay the basic framework 
for the study of cosmology. A large portion of the history of 20 th century cosmology 
surrounds various attempts to understand the relative contributions of matter, radiation, 
curvature, and dark energy to the Hubble parameter, which measures the expansion rate 
of the universe. The exact nature of these various contributions has far-reaching conse- 
quences, determining how, when, and where galaxies, clusters and other structure form and 
evolve; determining the age of the universe and the character of its evolution through time; 
determining cosmic abundances and the initial conditions of stellar evolution and planet 
formation; and determining the nature of the universe's beginning, and the possibility of 
its eventual end. Observational measures of these consequences allow constraints of the 
properties of the various VL w (z). 

Eqn. |1,27 is simply a first-order differential equation in a: For various choices of the 



density parameters Q w , it can be solved to yield a curve describing the scale factor a as 
a function of time t. A straightforward route to placing observational constraints on the 
densities of various components, then, would require simply measuring the value of a at 
several times t and performing a multidimensional fit to these observed data points. 



As discussed above in £ 1.2.1 the redshift of light offers a direct measurement of the 



scale factor at a given time. In order to use eqn. 1.27 to gain information about the 
cosmological densities Q w , then, we must be able to observe some property related to the 
time t e of emission of these photons. To enable this, we'll introduce the concept of distances 
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in Cosmology. 

Distance measures in cosmology are a potentially confusing subject. An excellent re- 



source describing these can be found in Hogg (1999). Here we'll briefly define four relevant 



distance measures: the comoving distance, the proper distance, the angular diameter dis- 
tance, and the luminosity distance. 

Comoving Distance: The comoving distance is the distance r which enters into the 
FLRW metric, eqn. |1.1| This distance is constant for two objects moving with the 
expansion of space. Using the FLRW metric and setting ds = 0, one can shown that 
the comoving distance for an object with redshift z is given by 

f* dz' 

+)-cf a m . (1.28) 

Proper Distance: The proper distance d p (z) is the simultaneous separation between two 
objects. From the FLRW metric with time interval dt = 0, one can show that the 
proper distance is given by 

w = rr z - (i-29) 

Angular Diameter Distance: The angular diameter distance d A (z) is the ratio of the 
proper size to the observed angular size (in radians) of an extended source. From the 
FLRW metric with dt = dr = 0, one can show that the angular diameter distance is 
given by 

d A (z) = f^. (1.30) 

Luminosity Distance: The luminosity distance cIl(z) relates the emitted flux of a source 
to the observed flux. Taking into account the angular dilution of the flux, as well as 
the redshifted energy of each photon and time-delay of photon arrival leads to 



d L (z) = (l + z) 2 d A (z) 
= (l + z)S K (r) 



(1.31) 
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By measuring one of these distances as a function of redshift, we are effectively measuring 
both the dependent and independent variable in the dimensionless Friedmann equation, 
eqn. 1.27| In this way, it is possible to constrain the combinations of cosmological parameters 



Q w that fit the data. 

1.4 Standard Candles: Cosmology via Luminosity Distance 

Standard candles have been one of the primary methods for measuring cosmological param- 
eters. The earliest measure, from Edwin Hubble, used Cepheid-type variable stars that have 



an absolute magnitude correlated with their pulsation period (Leavitt & Pickering, 1912 



Hubble, 1929). Through observations of Cepheids in nearby galaxies, Hubble found a posi- 
tive correlation between the luminosity distance to each galaxy and its apparent recessional 
velocity, indicating a positive value for -ffoQ Using our formalism above, we can show that 
the rate of change of the proper distance to any object is 

j t d p (t) = H(t)d p (t) 

« H d L {t)[l + O(z)], (1.32) 

where in the second line we have separated corrections of order z. Because Hubble used 
a sample with z <~ 0.005, the zeroth-order linear fit is well within observed errors. Thus 
Hubble's observations can be seen as a first attempt at constraining cosmological parameters 



in Equation 1.27 through the simultaneous measurement of redshift and luminosity distance. 

In the 70 years after Hubble's discovery, there were many attempts to confirm and im- 
prove upon his discovery and use it to derive tighter constraints on the slope Ho of the 
Hubble relation. A large majority of these have been based on standard or standardizable 
candles. These standardizable candles have been based on several empirical relations, in- 



cluding the period-luminosity relationship of Cepheid variables (Leavitt Sz Pickering, 1912) 



on the Tully-Fisher relationship between brightness and rotation speed of spiral galaxies 



(Tully & Fisher, 1977); on the Faber- Jackson/Fundamental Plane relationship between 



2 Although Hubble does not frame this measurement in terms of the Friedmann equations, he cautiously 
mentions the potential relationship of the observations to the "de Sitter effect" , a predicted redshift within 
a particular solution of Einstein's equations which is a special case of the FLRW metric and Friedmann 
equations (i.e., that with Qm = 0, Q K = 0, and Q,a = !)• 
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brightness, velocity, and surface brightness for elliptical galaxies (Faber & Jackson, 1976 



Djorgovski & Davis, 1987); and on the relationship between luminosity and decay timescale 



for type la supernovae (Phillips 1993). Due to their extreme intrinsic brightness and min- 
imal scatter in calibrated luminosity, type la supernovae have become a very important 
tool for determining cosmological parameters, but they cannot be used alone: independent 
measures are required to calibrate the distance scale of type-I supernovae, making all the 
above approaches important. Perhaps the most comprehensive study to date of these var- 



ious standard candles is the HST Key project (Freedman et al. 2001), which combined 
the above measures and others to arrive at a value of Hq = 72 ± 8 km/s/Mpc. This is 
consistent with the tighter constraint from the WMAP 7- year CMB analysis, which gave 
H = 70.4 ± 2.5 km/s/Mpc. 

The common theme in the above methods is that they are based on the idea of a standard 
candle: if we can determine the intrinsic brightness of an object as well as its redshift, then 
we can compare this to the apparent brightness and constrain the Hubble parameter H(t). 
Another path to this sort of constraint comes from standard rulers rather than standard 
candles. If we know the redshift as well as the intrinsic size of an object, then we can use 
its apparent size to constrain cosmological parameters. One such standard ruler is given by 
the characteristic scales of structure in the universe. 

1.5 The Growth of Structure 

There are several methods of cosmological parameter estimation that rely on the idea of 
standard rulers. Some examples are the observation of the anisotropy scale in the Cosmic 
Microwave Background, and the observation of the Baryon Acoustic Oscillation scale. An- 
other powerful method relies on detailed modeling of the growth of structure. As we will 
see, along with offering the possibility of a standard ruler, consideration of structure growth 
can offer other cosmologically interesting observables. 

The distribution of density throughout the universe can be expressed in terms of the 
density contrast 

p(x,t)-p b {t) 

6{x,t) = — , (1.33) 

Pb(t) 
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where the background matter density is 

Pb(t) oc 



1 



a(tf 



(i.e. we assume the background consists of cold matter with it; 
definition results in expressing the density field of the universe as 

p(x,t)=pb(t)[l + 6(x,t)]. 



(1.34) 



0). Rearranging this 



(1.35) 



1.5.1 Gravitational Instability 

We can proceed by treating matter as an ideal fluid with a velocity field u(x, t) and pressure 



P(x,t) (Longair, 2008). In this case, it is governed by three equations: the continuity 
equation, which describes conservation of mass, 

'dp s 



Ot 



+ pV x • u = 0, 



the Euler equation, which specifies conservation of momentum, 



du\ . „ . V X P 

— +«.v,« + — 

dt J x P 



and the Poisson equation, which describes gravity in the Newtonian limit 



$ = AnGp. 



(1.36) 



(1.37) 



1.38) 



Transforming to comoving coordinates r = x/a(t), defining comoving time derivatives 
d/dt = d/dt + u ■ V, and expressing in terms of the dimensionless density contrast 5(r,t) 
(eq. 1.33[ ) gives 

where indicates the Laplacian with respect to comoving coordinates (for derivation see 



section 11.2 of Longair, 2008). 



We can gain insight by expressing this in terms of wave-like solutions of the density 
contrast 5k(r,t) oc exp[i(k ■ r — ujt)], where k is the vector of spatial wave-numbers, and ui 



is the oscillation frequency. In terms of these Fourier modes, eqn. 1.39 becomes 

5 k + 2HS k - 5 k (4irGp b - k 2 c 2 s ) = 0. 



(1.40) 
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This differential equation describes an exponentially growing (or decaying) density fluctu- 
ation, with a "drag" term governed by the Hubble parameter H(z). The rate of growth 
of perturbations 5k depends on the balance between the gravitational force through AirGpb 
and the pressure through k 2 c 2 . The scale Aj = 2n/kj where these forces balance is called 
the Jeans length: 

This is the scale above which pressure cannot halt gravitational collapse. The length is 



directly proportional to the sound speed c s = y/dP '/dp and so depends on the equation of 
state of the total energy in the universe, as well as the average density pb- The different 
components (radiation, matter, etc.) have different equations of state (£ |1.2[ ) and evolve with 
different dependencies on the scale factor a, and so the Jeans length also evolves through 
the course of cosmic history. Thus the scale of nonlinear collapsed structure as a function 
of z contains information which can be used to place constraints on the components which 
make up the Universe. 

In particular, we can consider two relevant regimes: radiation dominance and matter 
dominance. In the regime where radiation dominates the energy density, the equation of 



state w = 1/3 leads to c 2 = c 2 /3. In a flat universe pb is the critical density (eqn. 1.23) and 
the Jeans length for a radiation dominated universe can be expressed 



This scale is on the same order as that of the horizon scale, A s ~ 2c/(Hy/3). This means 
that sub-horizon modes cannot collapse during the epoch when radiation dominates the 
energy of the universe. 

In the matter-dominated regime, there are two possibilities: if radiation and matter are 
coupled, the pressure comes from the radiation while the density is dominated by matter. 



This gives c s ~ cr£l r /Q,M ~ c (l + - z )^r,o/^m,o- Putting in numbers from WMAP (Komatsu 



et al. 



2011), we find approximately c s ~ 10 2 c\/l + z. If radiation and matter are decou- 



pled, the pressure comes from the nonzero temperature of matter itself: c 2 ~ kT/m p where 
m p is the proton mass. Assuming the matter is in thermal equilibrium with the CMB, then 
temperature goes as T oc (1 + z). Again using observational constraints from WMAP, we 
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find c s ~ 10 7 C\J\ + z. Thus the Jeans length during the matter-dominated epoch is given 
by 



, Wl /;», i 10 2 before decoupling 

A j « X j ) v / TT^ x { P * (1.43) 

10~ 7 after decoupling 



The redshift of decoupling is approximately z ~ 1100 (for a physical argument for this 
see 



Ryden, 2003). This means that prior to decoupling, growth below approximately sub- 
horizon scales is suppressed by pressure. After decoupling, the Jeans length shrinks by five 
orders of magnitude, allowing linear structure on this scale to form. The observable effect 
of this sharp transition in the growth of structure will be discussed further below. 

A related question is that of the rate of structure growth on scales larger than the Jeans 
length. This can be addressed by defining the linear growth factor D{t) such that 

6{r,t) = 6 {r)D(t). (1.44) 

Using pb = QmPc and assuming negligible pressure (i.e. scales above the Jeans length), we 



can rewrite eqn. 1.39 



as 



D + 2HD- ^n M H 2 D = 0. (1.45) 

This is a second-order differential equation, which will, in general, admit a solution with a 
growing mode and a decaying mode: 

D(t) = A l D 1 {t) + A 2 D 2 (t). (1.46) 

In a flat universe dominated by matter, the Friedmann equation gives H(t) = 2/(3t), leading 
to solutions 

D{t) = Arf 2 / 3 + A 2 t~ 1 . (1.47) 
The first term quickly dominates the second, and structure grows as 5 oc 

£2/3 

oc a, where 



the last proportionality comes from solving eqn. 1.27 for a matter dominated universe 



In general, the growth factor for a flat universe is 

r a da' 

D{a) K I WWW' (L48) 

where the normalization is usually chosen such that D{a) = 1 at the present day. 
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A radiation-dominated universe presents a more complicated case: eqn. 1.45 assumes the 
pressure is negligible compared to the gravitational force. This approximation breaks down 
in cases when Qm ~ ► 0- For these cases, we need a more involved perturbative treatment. 

1.5.2 Perturbation Treatment 

To explore the growth rate in a universe where fi^ is small, we will perform a perturbation 
analysis of Friedmann's equations. A full discussion of this treatment can be found in 



Peebles (1993). Here we will briefly outline a schematic approach from Kolb & Turner 



(1990) which leads to the same results. 



By Birkhoff's theorem (Birkhoff & Langer 1923), a small spherical over-density can be 



treated as if it were an independent homogeneous universe embedded within the background. 
We'll assume the background is represented by a flat universe with 



, 8vrG 



H = ^ e °' ( L49 ) 

and that a spherical perturbation has a small positive curvature 

o 8ttG kc 2 

H = ^ - <L50) 

The boundary requires that the expansion rate H be equal between the two; combining 
these we find 

El - £ _ 3kc 4 1 

e 8<KGR 2 aW 1 ' 

For a matter-dominated universe, Eq oc a -3 which gives 5 oc a as above. For a radiation- 
dominated universe, eo oc a~ 4 which gives 5 oc a 2 . 

1.5.3 Matter Power Spectrum 

In summary, the above results show that 

• In the radiation-dominated regime, fluctuations on scales above Xj = (ttc\/8)/(3H) 
grow as 5(a) oc a 2 . 

• In the matter-dominated regime, after decoupling, scales above Aj M "* « 10~ 7 Aj y/\ + z 
grow as 5(a) oc a. 
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The ratio of radiation density to matter density is 

Um{Z) ilMfi 

So before the redshift of radiation-matter equality, z rm ~ ^M,oA^.R,o> radiation dominates, 
while after this redshift matter dominates. Thus the important scale is the horizon scale 
X rm at redshift z = z rm . Modes on length-scales k > \ rm will grow as 5 oc a 2 for a < a rm , 
and 8 oc a for a > a rm . Modes with length-scales k < \ rm will grow as 5 oc a 2 as long as 
the horizon distance dh or {a) < k, at which point the growth will be suppressed by radiation 
pressure. At a > a rm , the Jeans length shrinks by a factor of about 10 5 , and modes larger 
than Xj 1 resume growth with 5 oc a. 

Thus, density modes with k > k rm (that is, scales smaller than the matter-radiation 
equality horizon scale) are suppressed by a factor of (ak/a rm ) 2 oc k~ 2 . This motivates use 
of the power spectrum of density fluctuations (for details see Appendix [A]) : 

Pk = (\6 k \ 2 ), (1.53) 

in theory a measurable quantity, which will have a distinct break at k = k rm for the reasons 
discussed above. In particular, if the power spectrum of primordial fluctuations is a simple 
power law with oc k n , then after decoupling the power spectrum will be approximately 

{k ■ for k ^ krm 
(1.54) 
k n , for k < k rm - 

Under most inflationary scenarios, it is expected that the power spectral index n ~ 1 



(Peacock 1999 ). The normalization of the power spectrum depends on the magnitude of the 
primordial fluctuations, and is dependent on the cosmological model. This normalization is 
typically expressed in terms of the parameter a§ , which measures the magnitude of average 
fluctuations within a sphere of radius 8 Mpc. For a more thorough discussion of power 
spectra and the as normalization, see Appendix \A\ 

1.5.4 Putting it all together 

The sum of the above discussion paints a general picture of the growth of structure within 
the universe presents several concepts with readily observable consequences: 
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The size and mass/length scale of clusters as a function of redshift depends on the 



length-scale of gravitational instability Aj (eqn. 1.41), which in turn depends on the 



densities of matter and radiation in the universe. Clustering also depends on the linear 



growth rate D{z) (eqn. 1.48) and its nonlinear extensions, which also depend on the 



relative cosmic densities as a function of z. 



• The power spectrum of density fluctuations (eq. 1.53) has a turn-off at a length scale 
that is closely related to the horizon distance at the epoch of radiation-matter equality. 
This scale acts as a standard ruler, such that the angular diameter distance can be 
estimated at a particular value of z, leading to cosmological constraints through the 
same means as the standard candle method discussed in 31.41 



• The linear growth factor (eqn. 1.48) affects the normalization of the power spectrum. 
Therefore, measuring the power spectrum as a function of z leads to cosmological 
constraints through the dependence of D{z) on cosmological parameters. 

So we see that there are powerful cosmological constraints that can be obtained through 
the observation of the density fluctuations and clustering of matter through the universe. 
There are several caveats, however: the above discussion focuses on the linear approximation 
(that is, we discuss the behavior of perturbations of order 5, while ignoring 5 2 ). This is 
sufficient for small 5, but not for when 5 is much larger than 1. At small scales, structure 
is well beyond the regime where this approximation holds: for example, 5 for our galaxy 
is approximately 10 5 ! Even moderate-sized galaxy clusters (i.e. a hypothetical cluster 
2Mpc across, containing 50 Milky- way sized galaxies) have 5 of order 10 2 — 10 3 . Clearly, 
nonlinear effects must be taken into account when measuring clustering. These effects can 



be estimated several ways; one of the more successful is the halo model of Smith et al. 



(2003), which is calibrated using semi-analytic results from N-body simulations. Nonlinear 
effects lead to a significant boosting of power on small scales. 

A second caveat is that the structure we are referring to here is that made up by the bulk 
of the matter in the universe: collisionless dark matter. Dark matter, being non-luminous, 
cannot be observed directly through emitted light. The power spectrum of luminous matter 
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can be theoretically mapped to the underlying mass power spectrum, but this mapping 
requires uncertain corrections and introduces systematic errors that are difficult to calibrate. 

With careful accounting for the above two caveats, observations of the spatial distri- 
bution of luminous matter have led to interesting cosmological results. One of the most 
important surveys in this regard has been the Sloan Digital Sky Survey (SDSS), which 
measured spectra of nearly a million sources across over 8000 square degrees, leading to 
accurate photometric redshifts of hundreds of thousands of galaxies across over a third of 
the sky. Measurements of the angular power of these galaxies have been successfully used 



to constrain cosmology both from the nonlinear power spectrum (Tegmark et al. 2006) and 



the Baryon Acoustic Oscillation (BAO) signal (Eisenstein et al. 2005). 

The BAO signal is another cosmological constraint based on the idea of a standard 
ruler. In the above discussion, we mention the role of pressure in suppressing the growth 
of structure before radiation and matter are decoupled. This suppression by pressure leads 
to acoustic oscillations in the plasma of the radiation and baryons. When the radiation 
and matter decouples, these oscillations freeze-out and form the seeds of baryonic structure 
growth. The remnants of this freeze-out can be observed in baryonic structure today, and 
the characteristic length scale has been used to place tight constraints on cosmological 



parameters (Eisenstein et al. 2005). 



1.6 Gravitational Lensing 

In order to circumvent the astrophysical bias involved with mapping luminous matter to the 
underlying dark matter, it would be preferable to observe the dark matter directly. This 
is where gravitational lensing can make an important contribution. Einstein's theory of 
General Relativity predicts that photons will be deflected in the presence of a gravitational 
field. Under certain circumstances, this deflection can be detected and used to learn about 
the nature of the gravitating matter. 

1.6.1 Simplifying Assumptions 

The propagation of light through a region of gravitational potential <I>(r) is, in general, a 
very complicated problem, only analytically solvable for potentials with various symmetries. 
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In cosmological contexts, however, it is safe to assume that the universe is described by a 
Robertson- Walker metric, with only small perturbations due to the density fluctuations 
described by the potential <£. In this case, the gravitational deflection of a photon can be 
described by an effective index of refraction given by 

2 



n 



1 



(1.55) 



(see Narayan & Bartelmann, 1996, and references therein). As in conventional optics, light 
rays are deflected in proportion to the perpendicular gradient of the refraction index, such 
that the deflection angle a is given by 



a 



V ± n dD 



C ./o 



Vi$ dD 



(1.56) 



where Ds is the distance from the observer to the photon source. 

For a point-mass located at a distance Dl and an impact parameter b, with Dl 3> b 



and Ds ~ 2Dl, equation 1 1 . 56| can be integrated to give 

2 



AGM 

~b^~ 



1 



- - 

2\D L 



+ o 



D L 



(1.57) 



The first-order approximation is twice the deflection predicted by Newtonian gravity for a 
particle of arbitrary mass moving at a speed c. It is important to note here that to first 
order, the deflection does not depend on the distance to the lens or source. That is, for a 



mass distribution located at a distance Dl » b, equation 1.56 can be approximated 



a 



D L +5D 



(1.58) 



c JD L -8D 

for 6D sufficiently greater than the size scale of the mass-distribution in question. So, to a 
very good approximation, the incremental deflection 5a of a photon at a given point along 
its trajectory is entirely due to an overdensity of matter with a thickness 2 5D, oriented 
perpendicular to the unperturbed photon trajectory. This is the thin lens approximation. 



1.6.2 Lensing Geometry 

For a mass-sheet located at a distance Dl, and a photon source located at a distance Ds 
(with Dls = Ds — Dl) geometric considerations in the small-angle approximation (see 
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Figure 1.1: The geometry of gravitational lensing 



Figure [Li]) yield the relation 



e = p + ^k (i.59) 

J->S 



where 9 and j3 are the observed and true positions of the source, respectively. Rescaling a 
in more convenient units gives 

9 = j3 + a (1.60) 



where we have defined 



1.6.3 Continuous Mass Distribution 



a = — —a (1-61) 
Ds 



In the case of a continuous mass distribution, we can recall the remarks of section 1.6.1 
and define a surface- mass density for a mass-sheet located at a redshift zl- 

fD L +8D r-z L +5z 

PM (9,D)dD = ^ \ 

ID L -8D c Jz L -8z 

where £m = Pmc 2 is the energy density of matter, 9 is the apparent angular position, and 
z and D{z) are the redshift and line-of-sight distance, respectively, with Ds = D(z s ). 



[■D L +8D i rz L +8z jn 

E(9,z L )= PM (9,D)dD = ^ s M (d,z) —dz (1.62) 

JDt-SD c Jz,-8z az 
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A matter distribution pm{9, z), an d its Newtonian potential <&(9, z) are related by Pois- 
son's equation: 

X7 2 $(8, z) = 4irGp M (9, z) (1.63) 



It is convenient to define the unsealed lensing potential tp, given by 

-D(z s ) 



ip(9, z s 



<S>(9,z(D)) dD 



o 



Zs -> dD 

$(M)V dz 

az 



(1.64) 



Using the approximation in equation 1.58 , we can write this in terms of multiple mass-sheets, 
such that 

_ _ j-Di+SD 

(1.65) 



with D i+ i = Di + 2SD. 

The gradient of ip with respect to £ = D^O is 



rDi+SD 

= V V^(# 4 ) = T / V^dD 

i iJDi-SD 



(1.66) 



Comparing this with (1.58) and (1.61) gives the incremental deflection angle in terms of the 



lensing potential of a mass-sheet: 



5di = 4-^^V^(^i) 



c 2 D S 

Further simplification can be made by rescaling the lensing potential, defining 

2 D LS 



(1.67) 



c 2 D L D S 



so that we are left with 



6a i {e,z L )=V e (5iP i (9, 



Z L, 



(1.68) 



(1.69) 



Defining the total scaled lensing potential ip = Stpi, and the total deflection a = ^ 53i, 
we obtain 

a(9,z s ) = V e %b(9,z s ) (1.70) 



The Laplacian of Sipi with respect to theta is given by 

2D LS D L f D ^ SD 



c 2 D s Jd l -8d 



(1.71) 
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Using (1.62) and (1.63) this becomes 



2 ^ M ^G DlsDl 



(1.72) 



We now define the critical surface density, 



£ c (z) 



c 2 ^ 



and the convergence 



4irGD L (z)D LS (z) 



(1.73) 



(1.74) 



Now summing all the mass-sheets in ( |1.72[ ) gives the relation between the scaled lensing 
potential and the convergence 



V2V(M») = 2K(0,*a 



(1.75) 



Solving this two-dimensional differential equation gives the effective potential in terms 
of the convergence: 

1 1 ' ' e'\d 2 e' (i.76) 



Mz) = - I K(ff,z)]n\ 

TT 



1.7 Weak Gravitational Lensing 



The local properties of the mapping in (1.60) are contained in its Jacobian matrix, given by 

0V 



de 



11 ae 4 J~ {%3 



dOidOj 

where i,j index the two components of the angular position. 
Introducing the abbreviation 



(1.77) 



d 2 il> 

13 ~ dOidOj 



(1.78) 



We can then rewrite the convergence k (eqn 1.75 ) and define the complex shear 7 = 71 -M72 
of the mapping and write: 

K = (^il+^2 2 )/2 

71 = (^l-«/2 (1-79) 

72 = V>21 = ^12 
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The local Jacobian matrix (1.77) of the lens mapping can then be written 



A 




(1.80) 



Equations 1.76, 1.78 and 1.79 can be combined and simplified to yield the following 
relationship between the convergence and the shear, where for simplicity we define the 
complex angle 9 = 9\ + iO^ 



1(0) 



-1 



7T 



v{e - e')K(9')d 2 e' 



where 



[) (61 + 61Y ~W 



(1.81) 



(1.82) 



is the Kaiser-Squires kernel (Kaiser & Squires, 1993). The lens mapping in eqn. 1.80 de- 



scribes an image transformation consisting of a magnification with magnitude given by the 
real convergence k and a distortion with magnitude and orientation given by the complex 
shear 7 = 71 + 272- This distortion results in a measurable effect, at least in principle. If 
the intrinsic shape, size, or brightness of a distant image were known, then the observed 
shape, size, or brightness could be observed to determine the shear and convergence at that 
point. Unfortunately, the intrinsic shape and size of a galaxy cannot be known a priori, but 
using well-founded assumptions about the statistics of the distribution of shapes and sizes 
of sources can lead to useful estimates of the shear and/or convergence across the sky. 

In the most common approach to weak lensing, the ellipticities of source galaxies are 
measured, giving a noisy estimate of the reduced shear 



1 - k(6) ' 



(1.83) 



In the weak limit where k(9) <C 1, analyses often assume j r (9) ~ j(9), though with higher- 
precision measurements, this second-order effect can introduce systematic errors in mass 



maps and power spectra (Dodelson et al. 2006; Shapiro 2009 Krause & Hirata 2010). 



Once the shear field is estimated, the measurements can be utilized in a number of ways to 
learn about fundamental physical principles; this work will focus on three areas: 
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Direct mapping: Having measured the shear 7 at locations across the sky, the convergence 
k can be estimated, k relates to the projected density via eqn. 1.74| Thus the measured 



shear can be used to directly estimate a map of the distribution of dark matter in two 
dimensions. Using redshift information for the lensed sources, there is the potential 
to extend this mapping to three dimensions. This is the subject of Chapter 3. 

Peak statistics: The two dimensional maps recovered as above represent a two-dimensional 
projection of the three-dimensional distribution of large scale structure, in particular 



massive galaxy clusters. As discussed in {1.5, both the number of clusters and their 



mass distribution depend on the details of the geometry, expansion, and makeup of 
the universe. By computing the statistics of observed lensing peaks to that predicted 
by theory, it is possible to constrain cosmological parameters using the peaks alone. 
This is one of the subjects reviewed and explored in Chapter 4. 

Power spectrum: The power spectrum of the shear is closely related to the power spec- 
trum of the matter distribution that generates it. By measuring two point information 
of observed shear, it is possible to constrain cosmological parameters in a way that is 
complementary to the peak counts mentioned above. This is the subject of Chapter 
5. 

To enable these three analyses, we will develop a bit further the basic principles of weak 
lensing mappings and power spectra. 

1.7.1 Mapping with Weak Lensing 

The tomographic approach to the 3D lensing mapping problem can be computed using the 



following steps (see, e.g. Hu & Keeton 2002; Simon et al. 2009 VanderPlas et al. 2011): 



1. Prom the measured ellipticities and redshifts of photometrically observed galaxies, 
obtain noisy estimates of the shear 7o6 s (#, z). 



2. Using eqn. 1.81 , recover an estimate of k(9, z). Note that due to the integral over the 
lensing kernel T>(9), the convergence estimate is non-local: the value of k at a given 
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location is related to the value of the 7 at all other locations. 



3. Using eqn. 1.74, determine the projected density £(#, z). 



4. As a final step, it is possible in principle to use eqn. 1.62 to recover the 3D mass 
density p(9,z). This is the subject of Chapter 3. 



To accomplish this, it is convenient to combine steps 3-4 and write the expression for k(6, z) 



in terms of p(6,z) explicitly. From (1.62) and (1.74), approximating the sum as an integral, 
we find 



- [*° DW(z)[DW(z 3 )-DM(z)] , dD( A \z) A 
K (9,z s ) = 4,Gj o MM) ___d* 



(1.84) 



The notation has been changed here to make clear that the distances in question are in fact 
angular diameter distance, the relevant distance in the context of lensing calculations (See 



{ 1.3). Recall that angular diameter distance is related to the comoving distance D by 



D^ A \z) = aS K (D) 



(1.85) 



where S K (D) = D for a flat universe. Assuming a flat universe, converting to comoving 
distances, and writing this in terms of e = pc 2 , we find 



- 4vrG f" dD 2 D{D S -D) f3 > 
k{V,z s ) = —5- / dz—a e M {0,z), 



c Jo dz D s 
where we've used the shorthand D = D(z) and Ds = D(z, 



(1.86) 



To further progress, we can follow £|1.5|and write the matter density £m{8, z) hi equation 



1.86 in terms of the density contrast 5: 



e M (0,z) = n M (z)£ c (z) l + 5(9,z) 



(1.87) 



where we have assumed a flat universe, such that the total density is equal to the critical 



density £ c (z) (eqn. 1.23). We'll make use of two further algebraic substitutions: from the 



definition of comoving distance (eq. 1.28), we can write 



dD 



dz H(z)' 



(1.88) 



28 



and from the Friedmann equation (eqn. 1.27) matter density fraction can be written 



VL M {z) 

Combining these equations gives 



K(z a ) 



3cH2tt 



1 'A/,0 



2 J dZ H(z) D s 
Because of the mass-sheet degeneracy, k(z s ) can only be determined up to an additive 



[H(zW ' 
1 + z) D{D S - D) 



(1.89) 



[l + 6(z) 



(1.90) 



constant across a given redshift bin (see Seitz & Schneider, 1996, for discussion). Assuming 
the observed field is large enough to average the effects of cosmic variance, the additive 
constant will be due simply to the background matter distribution. Defining R(z s ) to be the 
convergence due to the background matter distribution in matter-dominated growth, and 
A(z) = 6(z)/a we find 

3cH^ M ,o f Zs ^ 1 D{D S -D) 



k(z s ) = 
where, to be explicit, 



K(Z, 



k(z s ) 



k(z s ) 



H(z 



s 'A/,0 



(l + z)D(D s -D) 



-A(z) 



dz 

2 Jq H(z) D S 
To be clear, here, D is the comoving distance to a redshift z, and Ds is the comoving 



(1.91) 



(1.92) 



distance to the redshift z s of the photon source. Equation 1.91 defines the mapping from 
k(z s ) to A(z) for z < z s . 

1.7.2 Power Spectra 

Mass mapping can lead to deep astrophysical and cosmological insights through the compar- 



ison of dark and luminous matter distributions (e.g. Clowe et al. , 2006), through constraints 



on the mass profiles of collapsed structures (e.g. Oguri et al. 2012), or through the com- 
parison of observed mass peaks to theoretical predictions (see Chapter 4). Because of the 
noise inherent in lensing observations, most of these localized analyses are limited to very 
dense regions, far from the linear regime. 

The linear regime, as well as the presence of nonlinear effects on small scales, can be 
measured using power spectra of the weak lensing shear. In order to accomplish this, 
however, the power spectra of observed shear must be related to the mass power spectra 



discussed in £1.5 
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E and B modes 



In this section, we will outline the basic results of Schneider et al. (2002b). We will start 



by defining the E/B decomposition of the shear field 7. If the shear 7 and convergence k 



can be expressed as shown in eqn. 1.79, then the gradient of k can be written 



VftK 



dn/d9i 
dK/d6 2 



071/001 + 072/002 
072/001 - 07i /002 



= u. 



(1.93) 



If k and 7 are due entirely to weak lensing, then the vector u should be a pure gradient 



field, as will every quantity in the equality in eqn. 1.93 This condition can be compactly 
expressed by noting that the curl of a gradient is identically zero: 



V x u = 0. 



(1.94) 



If, however, other effects are involved (e.g. shot noise, second-order lensing effects, intrinsic 
alignments, systematic errors, etc.) then u will not be a pure gradient field and will have 
a nonzero curl. With this in mind, we will use an analogy from electrodynamics and 
decompose k into a curl- free "E-mode" ke and a divergence- free "B-mode" Kb such that 



V 2 k e 
V 2 k b 



V • u 



V x u. 



(1.95) 
(1.96) 



We'll also define the E-mode and B-mode lensing potential following eqn. 1.75 



V 1pE,B = 2ke,b- 



(1.97) 



This allows us to define the E and B modes of 7 via eqn. 1.79 Explicitly, 



1E,B 



d 2 



o 2 



001001 002002 



(1.98) 



Combining the convergence E and B modes as a complex linear combination k = ke + ikb, 



we find in analogy to eqn. 1.81 



\lE{e) + iib{0)] = j_V{6- 0') [k e (0') + iK B {0')] d 2 0' 



(1.99) 
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We can define the Fourier transform of the convergence 



k E , B {t) = I d 2 6e i£ - e K EjB (0), 



(1.100) 



where £ is the angular Fourier variable. We can then define the power spectra (e.g. Schneider 



et al. 2002b) 



(«e,bW«er(0) = (2tt) 2 8 d (£-£')P e ^)- 



By the convolution theorem, the Fourier transform of eqn. 1.99 gives 



\£f 



ke,b(£)- 



The factor relating 7 and k can be expressed as a simple phase e 2l @ , so that 



(1.101) 



(1.102) 



(1.103) 



Based on this equality, we can define the shear correlation function and relate it to the 
convergence power spectra Peb, 



e+(0) = <7(O)7*(0)> 
roc M o 







2?r 



J {£9)[P E {£)+P B {£)), 



(1.104) 



where Jq(x) is a Bessel function of the first kind. The "+" distinguishes this correlation 



measure from two other shear correlations that can be defined, £_ and £x (see Schneider 



et al. , 2002a, for details). We will limit the discussion here to £+, because this is the relevant 
measure for our purposes (See discussion in £ |4.2.4[ ). 

The E-mode angular shear power spectrum Pe(£) can be expressed as a weighted line- 
of-sight integral over the matter power spectrum via eqn. |1.90| Taking into account the 



redshift distribution of galaxies gives (see Takada & Jain , 2004 ) 

P E {£) = J drW 2 {r)r' 2 P 5 [k = -;z(r] 



(1.105) 



Here r is the comoving distance, r s is the distance to the source, and W(r) is the lensing 
weight function, 

w . 3n m , H 2 r [*M r(z)-r 

W(r) = -r-v^— / dz n(z) . , — (1.106) 

■• ' " r{z) 



2a(r) n g J z{r) 
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where n(z) is the empirical redshift distribution of galaxies, with n g = J* °° n(z)dz. This 
allows us to predict an analytic relation between the 3D mass fluctuation power spectrum 
Ps(k, z) and the correlation function of the shear signal z). The nonlinear mass fluctu- 



ation power spectrum P$(k,z) can be predicted semi-analytically (e.g. Smith et al. (2003)) 
and has a form dependent on the assumed cosmological model. The shear correlation func- 
tion can be computed from observed data. The important point is that this relation 
provides a direct (if noisy) measure of the distribution of mass: it requires no assumptions 
about the mass to light ratio or how the non-baryonic matter distribution relates to that 
of luminous matter. The observations can be related directly to theoretical expectations 
for the nonlinear power spectrum. For this reason, weak lensing studies provide a pow- 
erful probe for constraining the cosmological parameters that describe the geometry and 
dynamics of the Universe. 

In the following sections we undertake a systematic study of gravitational lensing ap- 
plications enabled by Karhunen-Loeve (KL) analysis. We begin in Chapter 2 with a devel- 
opment of the mathematical theory behind KL analysis. In Chapter 3, we explore the use 
of a KL-based method for reconstruction of three-dimensional mass maps from shear data. 
In Chapter 4, we explore how KL can be used to address incompleteness in shear surveys, 
allowing us to interpolate the signal across masked regions. In Chapter 5, we use KL as 
a basis for computing parameter constraints from two-point statistics of a shear field with 
incomplete sky coverage. 
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Chapter 2 

INTRODUCTION TO KARHUNEN-LOEVE ANALYSIS 

Karhunen-Loeve (KL) analysis is a commonly used statistical tool in a broad range of 
astronomical applications, from, e.g. studies of correlations in observed properties of galaxy 



photometry (Efstathiou & Fall, 1984) and galaxy and quasar spectra (Connolly et al. 1995 



Connolly & Szalay, 1999 Yip et al. 2004a|b ), to analysis of the spatial distribution of 



2000; 


Szalay et al. 


2003, 


Pope et al. 



2004 ; Tegmark et al. 2006 ) , to characterization of the expected errors in weak lensing 



surveys (Kilbinger & Munshi, 2006 Munshi & Kilbinger, 2006). In this chapter, we will 



develop the formalism of KL that will form the basis of the applications in the subsequent 
chapters. 

The KL formalism requires the liberal employment of algebra with vectors, scalars, 
matrices, and their generalizations. For clarity, we will begin by briefly specifying the 
notational conventions used in this chapter and throughout this work. 



2.1 Notational Conventions 

It is important to clearly distinguish between vectors, matrices, and scalars in the following 
formulation. Vectors will be denoted by bold lower-case symbols; e.g. x. Matrices will be 
denoted by bold upper-case symbols; e.g. X. Scalars will be denoted by non-bold symbols, 
either upper or lower-case. All vectors are assumed to be column vectors, while a row- vector 
is indicated by the transpose, x T . Single elements of a given vector or matrix are given with 
subscripts: xi is the i th element of the vector x, and Xij is the element in the i th row and j th 
column of the matrix X. The vector making up the j th column of X is indicated by x^). 

Note then, that by this convention, the component of matrix X can be equivalently 

(i) 

expressed Xij or x\ . 

In algebraic expressions, the normal linear algebra rules are assumed. For example, the 
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expression 

y = Mx + b (2.1) 

involves the vectors y, x, and b and the matrix M. This expression is short-hand for the 
summation: 

n 

Vi = Y^ M iJ X i + b i ( 2 - 2 ) 
3=1 

Using these rules, we can define the magnitude of a vector 

| sc | = Vx T x (2.3) 
2.2 Basis function decomposition 

KL analysis is simply a basis function decomposition, where the basis functions are derived 
based on the variance and covariance properties of a class of functions. We'll start by 
describing what is perhaps the best-known basis function decomposition, the Fourier series. 
We start here because it's a familiar concept that generalizes well to the fundamental ideas 
of KL analysis. 

2.2.1 Fourier Series 

The Fourier Series is a means of expressing a bounded function in terms of a certain class 
of oscillatory basis functions. It is a discrete version of the Fourier Transforms used in 



cosmological power spectrum analysis, and discussed in {1.7.2 
We'll define a set of oscillatory basis functions 



V t b — to, 



2mk(t - to) 



(2.4) 



tb — t a 

where t is the arbitrary dependent variable, k is the wave-number, and the function is 
defined in the region t a < t < t\,. We'll postulate that a function f(t) can be expressed as 
a linear combination of these basis functions: 

oo 

/(*)= E a ^k{t). (2.5) 

k=— oo 

Here are an infinite set of complex coefficients. Our claim is that any piecewise continuous 
and square-integrable function f(t) in the interval [t a , tb] can be represented this way. A 
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rigorous mathematical proof of this statement can be found elsewhere, but below we will 
lend support to this claim. 

Given the claim that Equation |2.5| holds, how can we compute the Fourier coefficients 
a k associated with a particular /(£)? Though the expression is well-known, we'll briefly 
derive it here because it illuminates some of the properties of Fourier transforms that will 
generalize to KL transforms. 

To begin, we'll multiply both sides of Equation 2.5 by the complex conjugate & k ,(x) of 



the basis function given in Equation 2.4, and integrate both sides over t from t a to 



(2.6) 



k=— oo 

On the right-hand side, we can exchange the order of integration and summation to find 



t a 



$* k ,(t)f(t)dt= 



$um h {x)dt 



ta 



(2.7) 



Let's examine the term in the square brackets. Plugging in the definition of the basis 

2wi{k - k')(t - t a Y 



functions from Equation 2.4, we have 

ft b 



1 

tb — ta 



exp 



h — t a 



dt 



(2i 



This gives two distinct situations for the integral on the right-hand side: when k = k', 
both the integrand and the term in the brackets is exactly 1. When k ^ k' , the integrand 
oscillates through an integer number of cycles between t a and t\, (remember that k and k! 
here are integers), and the result of the integral is exactly 0. So we see that the term in 
brackets is equal to simply the Kronecker delta function 5 k y, defined as 



(2.9) 




Putting this result into Equation 2.7, only one term of the sum remains and we find 



1 



(ik 



tb — t a 



m)f(t)dt 



(2.10) 



Equation 2.10 shows how to compute the Fourier coefficients a k for a given f(t). But 
one might wonder if this is a unique result. Could there be several possible sets of valid 
Fourier coefficients for a given function? 
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Let's assume that given a function f(t), there are two valid sets of Fourier coefficients 
afc and a' k that satisfy Equation 2.5 In this case, subtracting the two equations gives 



/(*) " fit) = £ (o fc - a' k )$ k (t) = 0. (2.11) 

k=— oo 

In a similar manner to above, we can multiply by 3>£(i), integrate over t from t a to t^, and 
extract a Kronecker delta function to yield 

oo 

(°* - a 'fc)^' = ( 2 - 12 ) 

fc=— oo 

Collapsing the sum, we find that a k = a' k for all k. This shows the uniqueness of the Fourier 
coefficients a k for a given function f(t) on an interval [t Q ,ib]. Thus, given an orthonormal 
basis <I>fc, there is a single unique linear combination that reconstructs a function f(t) on 
the defined interval. 

2.2.2 Generalizing Orthonormal Bases 

Stepping back for a moment, we have shown that for a particular class of basis functions 



one can find unique coefficients such that one of the expansions of Equation 2.5 
holds. A key observation is that all the derivations above rested solely on two special 
properties of these basis functions: 

1. The basis functions are orthonormal on the interval [a, b]. That is, satisfies 

[ " <S> k (t)<5>l,{t) = 5 kk , (2.13) 

Jt a 

2. The basis functions are complete on the interval [t a >i&]- That is, an arbitrary function 
f(t) can be approximated by 

N 

f(t) = Y J a ^k(t) (2.14) 

k=l 

and the mean square error satisfies 



lim / 



N -> 2 



fe=i 



dt = 0. (2.15) 
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As long as these two properties hold for a class of functions $fc(£), we would be able to 



repeat the above derivations and express any f(t) via Equation 2.5 This suggests the 
possible existence of other functions that fit these criteria. Some examples are the Legendre 
polynomials on the interval [—1,1], the Laguerre polynomials on the interval [0, oo), and the 
Hermite polynomials on the interval (—00,00). In fact, these different orthonormal basis 
functions are simply a generalization of well-known geometric bases (such as the x and y 
axis of a two dimensional vector space) into an abstract function space. Just as there are an 
infinite number of possible orientations for an (x, y) axis in a two-dimensional vector space, 
there are an infinite number of possible orthogonal function bases that work in the above 
formalism. Choosing the right basis can lead to a much easier analysis of a given problem. 

2.3 Karhunen-Loeve Analysis 

Because of the infinite number of possible orthogonal function classes, one might wonder 
how to choose the optimal class for any particular problem. Karhunen-Loeve analysis seeks 
to answer this question in a very general case. 

2.3.1 Derivation of Karhunen-Loeve theorem 

Imagine now that we have a random process Ft. This can be thought of as an arbitrarily 
large collection of functions f^(t) defined on the interval t 6 [a, b]. At a given location t, 
the expectation value of the random process is given by 

E[F t ]=tim ^ ( 2 - 16 ) 

iV— s>oo iv * — ' 
i=l 

For simplicity, we'll assume that the random process Ft is centered; that is E[Ft] = 0. A 
general random process can be centered by subtracting the expectation value for each t. A 
centered random process Ft can be characterized by its covariance function, which is defined 
as 



C F (t,t') = E[F t Ff] 

1 N 

= &lirE/ (0 W/ W (O- (2-17) 



i=l 

For an uncorrelated random process, Cp(t, t') = VF(t)5t t t> where Vf(£) is the variance of Ft- 
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2.3.2 Eigenf unctions 

We'll now introduce the eigenf unctions e k {t) of the covariance function C F (t, t'), which 
satisfy 

rti, 

C F (t,t')e k (t')dt' = X k e k (t) (2.18) 



subject to the constraint that J^ b \e k {t)\ 2 dt ^ (i.e. e k (t) is not everywhere zero). Here X k 
is the eigenvalue associated with the eigenfunction e k (t). 

Now what are the properties of these eigenfunctions? First of all, they are orthogonal 
on the interval [i a ,ib]. We can show this by considering two arbitrary eigenfunctions e k (t) 
and ey (t) . Consider the quantity 

rib fib 



I 

Jta 



dt dt' C F (t,t')e k >(t')e k (t) (2.19) 

Jta 



' t a J t a 

Because of the symmetry of the covariance, i.e. C F (t, t') = C F (t', t), and because the order of 
integration can be switched, this can be evaluated two different ways, which must be equal: 

f" dt e k (t) [" dt' C F (t,t')e k ,{t') = r dt' e' k (t') r dtC F (t',t)e k (t) 

f " dt X k >e k {t)e k ,{t) = f" dt' X k e k {t')ey{t') (2.20) 

Jt a Jta 

Rearranging the bottom line leads to 

(Afc-AfcO f" dte k (t)e k/ {t) = (2.21) 

■Jt a 

So for Afc X k i, then e k and e k > must be orthogonaQ From the definition in Equation 2.18 



we see that if e k (t) is an eigenfunction with eigenvalue then for any arbitrary constant K, 
K e k (t) is an eigenfunction with eigenvalue X k as well. To make the choice of eigenfunction 

more definite, we will assume all eigenfunctions are normalized: that is 

h 

dte k (t)e k ,(t) = 6 kk > (2.22) 

ta 

1 In the degenerate case when Xk = \y , one can still construct orthogonal vectors by linear combinations: 

e / 1 \ _ e k (t) + ey{t) 

+ [) ~ V2 
e _/ f N = e k (t) - e k ,{t) 
v/2 

This leads to two new orthogonal eigenfunctions e+ and e_ with the same eigenvalue Xk- 
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for all k. This still allows any eigenfunction to have an arbitrary phase: that is, an eigen- 
function may be multiplied by e ld for any theta, and still satisfy our orthogonality condition. 
This fact will become important later. 

The net result is that the eigenfunctions e k (t) form an orthonormal basis for the space of 
functions represented by the random process Ft- In general, the eigenfunctions also satisfy 



the completeness relation (see eqn. 2.15). The proof of the completeness of eigenfunctions 



for a symmetric kernel Cp{t,t') is rather involved, and can be found in, e.g. Courant & 



Hilbert (1989). 



Let's now consider the expansion of the random process Ft onto the eigenvectors e k (t). 
Analogously to the Fourier case discussed above, we have 



F t = Y / A k e k (t). 



(2.23) 



k=l 



(i) 

where here A k can be thought of as a set of coefficients a k in the same way that the random 
process F t can be thought of as a set of functions f^(t). Multiplying both sides by ey(t), 
integrating, and using the orthogonality of eigenvectors (this is analogous to the derivation 



in Equations 2.7 2.10) leads to 



A k = [ b F t e k (t)dt (2.24) 

Jt a 

Because of the fact that the random process is centered (i.e. E[Ft] = 0), it is straight- 
forward to show that -ELAfc] = as well. The more interesting computation is that of the 
covariance of the eigenvectors, C^(fc, k'). From Equations 



2.17 



and 



2.24 



we have 



C A (k,k') 



E[A k A k < 



E 



dt / dt'F t e k {t)F v e k ,{t') 



ta 



tb 



ta 



ta 



dt / dt'E[F t F t ,]e k ,(t')e k (t) 



dt 



ta 



dt'C F (t,t')e k ,(t')e k (t) 



Substituting Equation 2.18 we find that this gives 

C A (k,k') 



dt\ k ,e k i(t)e k (t) 



(2.25) 
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So we see that projection of the centered random process Ft onto the eigenvectors of its 
covariance matrix yields coefficients which are uncorrelated, with variance equal to the 
eigenvalues Afc. This result is the Karhunen-Loeve theorem, and it has many ramifications 
that will be discussed below. 



2.3.3 Partial Reconstructions 



We have shown that Karhunen-Loeve provides an orthonormal basis for a random field 
with uncorrelated projection coefficients. We can go further and show that Karhunen- 
Loeve provides the optimal orthonormal basis for low-rank approximations of functions in 
a random field. 



Above, we expressed the completeness relation for a single function fit) (eqn. 2.15). 



For a random process, an orthonormal basis 4>k(t) is complete if and only if there exists a 
random process B^ such that 

-i 2 



lim 



'Jtn 



N 



Ft-Y^BkMt) 



fc=l 



dt = 0. 



(2.26) 



A random process is low-rank if and only if there exists a complete orthonormal basis 4>k it) 
such that E[Bl\ = for one or more values of k. In other words, a random process is low- 
rank if for some basis (ftk(t), some values of k are not required for a perfect reconstruction 
of the function. 

Let us consider an arbitrary complete orthonormal basis (pk(t), with Ft = Y2h=i Bk<t>k(f)- 
Given this basis, we'll define the low-rank approximation of Ft 



N 

F r=E B Mt) 



We'll seek to minimize the expectation value of the squared error 



(2.27) 



£ 2 



N 



dt 



ta 



B k <Pkit) 

.k=N+l 



dt. 



(2.28) 
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Expanding the sum leads to 



oo oo 



f 2 



E £ ., 



t,, 



[B k B k ,<p k (t)<i> k ,{t)]&t 



/I E B k B k i8 kk i 

k=N+l k'=N+l 



E b. 

k=N+l 



(2.29) 



Taking the expectation value and plugging in the equivalent of Equation 2.24 for B k , we 
find 



E[£ 



E 



/ dt / dt'F t F t ^ m (t)^ m (t') 

,k=N+l Jta 



E 



dt 



dt'C F {t,t')(l) m {t)(t> m {i!) 



(2.30) 



fc=7V+l " h * 

We'd like to minimize this expected error over the basis 4> m (t), subject to the constraint 
that 4>m{t) are an orthonormal basis. We'll accomplish this by the method of Lagrange 
multipliers. Our Lagrangian is 



£({*(*)»= E 

k=N+l 



t a 



tb 



dt / dt'c F {t,t')4> m {t)4> m {t') - Ait l 



,(t) 2 dt 



(2.31) 



Minimizing this with respect to <p k gives 

dC ' db 



dt'C F (t,t')<f> k {t') ~ hfa(t). 



(2.32) 



ta 



dij> k {t) 

The optimum for each k is where this derivative equals zero; setting to zero and solving 



recovers the original eigenvalue problem (Eq. |2.18 ) from which we derived the KL basis 
{e k (t)}. By the uniqueness of the eigenvalue decomposition, this shows that the KL basis 
is the optimal basis for low-rank approximations of functions drawn from Ft- Furthermore, 
for an approximation using N eigenvectors, the mean squared error is given by 



E E ^ 

k=N+l 

oo 

E 

k=N+l 



(2.33) 
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This is an interesting result: it says that in order to minimize the expectation value of 
the reconstruction error £ jy for all N, we simply need to order the eigenvalues such that 
Afc > Xk+i for all eigenvalue-eigenfunction pairs (Afc,efc(i)). 

Because of this, throughout this work we will follow this convention when ordering the 
eigenvalues in a KL decomposition. 

2.3.4 i n the presence of noise 

In practice, the observed random field Ft is composed of the sum of a signal St and noise 
Nt- We'll continue to assume that both of these are centered. The covariance matrix then 
becomes 

C F (t,t') = E[(S t + N t )(S t , + N t ,)} (2.34) 

Under the assumption that the signal St and noise Nt are uncorrelated, this can be simplified 
to 

C F (t,t') = E[S t S t '} + E[N t N t r] 

= C s (t,t') + C N (t,t') (2.35) 

The Karhunen-Loeve eigenfunctions always diagonalize the full covariance Cf(£, t'). In the 
case of uncorrelated "white" noise, Cj^(t,t') = a 2 5(t — t 1 ) and both the signal and the noise 
become diagonalized. In this case, the noise per mode is a constant a 2 , and the ranking 
of the eigenfunctions leads to modes which are ranked in signal-to-noise. This is why KL 



modes are often referred to as "signal-to- noise eigenmodes" (Vogeley & Szalay, 1996). 

In cases when the noise is not white, we can still recover signal-to-noise information 
by preprocessing with a whitening transformation. The eigenvectors nfc(t) of the noise 
covariance, which satisfy 

/ " C N (t, t')n k (t')dt' = a ink {t) (2.36) 

Jta 

can be used to apply a whitening transform to both the signal and the noise. The whitened 
covariance is given by 

cf\k,k') ss [\t /VcHt,O nfc(tK,(f) 

Jt a Jt a °~kO-k> 

= Cf\k,k') + 5 kk ,. (2.37) 
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The signal is now expressed in the basis of the noise eigenmodes n^{t), and the noise has 
been made white with unit variance. The KL modes derived from the whitened covariance 
C^\k, k 1 ) will have eigenvalues that represent the signal-to-noise in each mode. 

2.3.5 Karhunen-Loeve: theory to practice 

The abstract formalism presented above is interesting in itself, but one might wonder what 
practical advantages can be gained from this discussion. In practice, we don't deal with an 
abstract stochastic process Ft, but with discrete, measured data. For this reason, the con- 
tinuous formalism from above can be transformed into a discrete linear algebraic formalism, 
as seen below. In this section we will discuss the practical computational aspects of KL 
analysis. 

Imagine, for the moment, that an astronomer has observed the spectra of N galaxies. 
After normalization and correction for redshift effects, the spectra can be encoded as a 
series of N real- valued functions /®(A) over some defined domain A £ [a,b]. In practice, 
we measure these spectra at a finite set of wavelengths X T = [Ai, A2 • • • Am] so that our 
observations /® become M-dimensional vectors. For convenience, we'll store these spectra 
in a N x M matrix F = [f^\ / ^ • • • f^] T , where each row of the matrix represents one 
spectrum. These series of spectra F can be considered a finite realization of a particular 
random process F\. 

The expectation value -Ef-Fx] can be approximated via the sample mean: 

1 N 

E[F x ]Kf = -J2f ii} (2-38) 

i=i 

and the covariance function can be approximated by the covariance matrix 

E[F\F\i] ^C F = j^zrj^F (2-39) 
where we have defined the centered matrix 

F = F — l N f. (2.40) 



and ljv is the length- iV vector of ones. The — 1 in the denominator of Equation 2.39 
is called the Bessel Correction, and results from the reduced number of degrees of freedom 
after the mean is subtracted. 
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We can approximate the eigenfunctions e^(A) and eigenvalues Aj via the diagonalization 
of Cf, computed using standard linear algebra techniques. The diagonalization of the 
covariance matrix is 

C F = VAV T , (2.41) 

where the columns of the matrix V are the eigenvectors (such that V T V = I, the identity 
matrix), and A is the diagonal matrix of eigenvalues, such that Ay = \5ij. In practice, 
the eigenvalues and eigenvectors can often be computed more efficiently via a singular value 
decomposition: 

U'EV T = ; 1 F (2.42) 

where the orthogonal matrices U and V are called the left and right singular vectors, 
respectively, and S is a diagonal matrix of singular values. One can quickly show from 
Equations [2739] and [2T42] that 



C F = V£U T ITEV T 

= VY?V T . (2.43) 



Comparing to Equation 2.41 we see that A = S , and the eigenvectors V are identical to 



the right singular vectors of Equation 2.42 up to an arbitrary ordering of columns. Ordering 



our eigenvalues according to the rule in £2.3.3 takes care of this uncertainty. 



The columns of V are the eigenvectors, and are the discrete representation of the eigen- 
functions e^(A). These eigenvectors satisfy all the properties of the KL bases discussed 
above: they diagonalize the sample correlation matrix, they provide the best possible low- 
rank linear approximation to any spectrum from the sample, and in the presence of uncor- 
related noise, they allow an orthogonal decomposition onto a basis with a natural ranking 
in signal-to-noise. 

In the case of the spectrum example, we have no theoretical expectation for the cor- 
relation matrix C(A, A'), so we are forced to approximate the matrix based on the sample 



correlation using Equation 2.39. If we had sufficient physical understanding of every process 
at work in each galaxy, it might be possible to compute that correlation matrix from theory 
alone. The number of variables involved, however, make this prospect near impossible. 
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There are other situations in astronomical measurement, however, when a theoretical 
expectation of the correlation matrix is possible in practice. We will see in the following 
sections how the correlation matrix of particular cosmological observations can be approxi- 
mated from theory alone. 

2.3.6 KL with missing data 

Along with discrete data samples, another challenge when applying KL to real data is the 
presence of missing data: given a KL basis, how can one derive the projected coefficients 
when data are missing? Note that in this section we'll assume that the KL basis has been 
obtained independently. It is also possible to derive a KL eigenbasis from incomplete data 



using an iterative approach: see, e.g. Connolly & Szalay (1999); Yip et al. (2004a). 



To begin, we'll assume that we have an observed object represented by the .fT-dimensional 
vector x = [xi, X2 ■ ■ ■ xk] T and a set of normalized KL basis functions V = [vx, v% ■ ■ • Vk\. 
arranged in order of decreasing eigenvalue Aj. We have shown above that the best rank- AT 
linear approximation of x is given by 

N 
i=l 

where the coefficients can be calculated as 

a, = Vi T x. (2.45) 

When x has missing data, however, the question of how to compute cij is not as straightfor- 
ward. Simply setting the missing values to zero will not work: the reconstruction will then 
faithfully recover those zero values. We desire instead to constrain the expected contribution 
of each eigenvector to x while ignoring the contribution of the missing data. 

A simple solution may be to simply truncate the vectors such that the dot product is 
only computed over unmasked values. It is easy to see that this is identical to the "set to 
zero" solution just discussed. Furthermore, there is the problem that in general, a set of 
bases truncated in this way does not retain its orthogonality. 

Another approach may be to derive a new basis which is orthogonal over the truncated 



space. This is similar in spirit to the method explored by Gorski (1994) in analysis of CMB 
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data. While this leads to a complete orthogonal basis for the observed portion of the field, 
coefficients of these new modes have no simple relationship to coefficients of modes covering 
the full field. In general, a rank-deficient transformation matrix must be inverted in order 
to convert between the two. 

We'll use a different approach. We have shown above that when noise is not present, the 
KL vectors define the optimal basis for rank-iV reconstruction in the least-squares sense. 
That is, for an arbitrary truncated orthonormal basis ^(jv)j (where truncated means we use 
only the first N columns of with < N < K), 



x 2 



2 



(2.46) 

is minimized on average when 3> = V(N) = [ v i ; v 2 ■ ■ ■ v n] , N < K. Here we flip the 
problem: we know the desired basis V(n), and hope to find the optimal vector of coefficients 
a,( N > which minimizes the x 2 i n the presence of the missing data (Connolly & Szalay 1999). 
We'll define a diagonal weight matrix W such that Wij = Wi5ij, where Wi = 1 where X{ is 
defined, and w% = where xi is missing. Our expression to minimize then becomes 

X 2 (« (Af) ) = (x - g a^vw) L-J2 a iN)ti v(A . (2.47) 

To minimize this with respect to the coefficients a(jv),i, we differentiate and find 

a 2 N 

= -2x t W 2 v^+2y a(N) lV ^ T W 2 v^. (2.48) 
Setting the derivative to zero and combining terms gives 

N 

X T W 2 V H) = J- af> v U) T w 2 v<» . (2.49) 
j=i 

If there are no areas of missing data, then W = I and we simply recover a, = Vi T x, our 
standard expression for finding the KL coefficients with no missing data. In the general 
case, however, because the inner-product of d'*' and v^' is modulated by W, there is no 
delta function to collapse the sum. 

We can simplify this notation by defining the correlation matrix of the mask Mn, such 
that 

{M N )ij = v^ T W 2 v^ (2.50) 
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so that eq. 2.49 can be compactly written 

x T W 2 V (N) = a {N)T M N . (2.51) 
From this, we can quickly see that the optimal set of coefficients is given by 

a W = [M N ]- l Vf N) W 2 x. (2.52) 



This can be viewed as a generalized form of the expression in eq. 2.45: if W is set equal 



to the identity matrix (indicating no missing data), then Mn is also the identity and we 



recover eq. 2.45 exactly. 

If some of the diagonal entries in W are zero, then the correlation matrix Mk for the full 
set of K eigenvectors is rank-deficient and cannot be inverted as required for eq. |2.52[ For 
this reason, it is essential to use the truncated eigenvectors Vtm, with N < rank(W). It is 
not strictly necessary to discard the eigenvectors corresponding to the smallest eigenvalues, 
but this choice leads to the highest signal-to-noise result. For the simple binary masking 
case where W is a diagonal matrix consisting of zeros and ones, the rank of W is equivalent 
to its trace, or the sum of nonzero diagonal terms. 

Once these approximate KL coefficients a,( N ' are determined, it is straightforward to use 
these to approximate the unmasked vector x: 

N 

£a?V*). (2.53) 



x 

i=0 



Here we have used the coefficients determined from the unmasked region of the data to 
constrain the unobserved value in the masked regions. 

This could be further generalized by allowing W to be an arbitrary matrix, for instance 
encoding the inverse of the noise covariance associated with the observed vector x. In 
unconstrained regions, the noise is infinite and the inverse is zero. This leads to very similar 
results to those expressed here (in this case, W 2 is replaced by W T W). This is equivalent 



to the whitening operation discussed in { 2.3.4 We will not use this formalism here, so we 
leave it only as a suggested extension. 
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2.4 Karhunen-Loeve Analysis and Bayesian Inference 

Because of the signal-to- noise optimality properties of KL, it can be very useful within 
Bayesian parameter estimation. Given observations D and prior information /, Bayes' 
theorem specifies the posterior probability of a model described by the parameters 

Pm}\DI) = P({g i }|/) P ™ < j^ /) (2-54) 

The term on the left hand side is the posterior probability of the set of model parameters 
{Oi}, which is the quantity we are interested in. 

The first term on the right side of the equation is the prior. It quantifies how our 
prior information / affects the probabilities of the model parameters. The prior is where 
information from other surveys (e.g. WMAP, etc) can be included. The likelihood function 
for the observed coefficients a enters into the numerator P{D\{9i\I). The denominator 
P(D\I) is essentially a normalization constant, set so that the sum of probabilities over the 
parameter space equals unity. 

KL is useful in the case where the model {9} can be expressed in terms of a covariance 



C{0} (see Vogeley & Szalay, 1996). Given a data vector d with observed noise covariance 
A/", the KL vectors are the eigenvectors of the whitened total covariance 

C {e}tW =^- 1/2 C {e} M- 1 ' 2 . (2.55) 

These eigenvectors can be used to quickly compute the KL coefficients of the observed data, 

a {e} = v\ e} M- l ' 2 d. (2.56) 

For a given model we can predict the expected distribution of coefficients 

x m = ( a {e i } a {8 i }) 

= v\ e} M- l l 2 C m Af- l/2 V {e} +I. (2.57) 

If the length of the data vector d is N, then the full analysis results in -X^} being an 
N x N matrix. Alternatively, one can truncate the eigenvectors to n < N terms: this leads 
to ^{6>i} being an n x n matrix, and is equivalent to working with an optimal low-rank 
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approximation of the data d. Using this, the measure of departure from the model {9} is 
given by the quadratic form 

x\ e} = ^X^ i} a (2.58) 

The likelihood is then given by 

P(D|{0}/) = £(a|{^}) = (27rr/ 2 |det(X {ei} )|- 1 /2 exp (_ x 2 e}/2) {2 59) 

where n is the number of degrees of freedom: in most cases, n = N, the number of eigen- 



modes included in the analysis. The likelihood given by Equation 2.59 enters into Equa- 



tion 2.54 when computing the posterior probability. This sort of approach will be applied 



to observed shear in Chapter 5. 

2.5 Karhunen-Loeve Analysis of Shear 

In the case of shear observations, the observed vector 7 consists of the N ellipticity obser- 
vations within the series of window functions Ai(6,z), with i = 1...N. In general, these 
window functions can overlap, though in most cases this would lead to correlated noise 
which makes the analysis more difficult. The expected correlation matrix of the observed 
shear 7 is given by 

[Cylij = J d 2 6A t {e,z) j dV^(0',^+(|0 - 0'\) +Af ij . (2.60) 

where the matrix Al ij is the noise covariance between bins. The shear correlation function 
£ + can be computed from the theoretical 3D mass power spectrum, using the results from 



the previous chapter (eqs. |1.104 1.106). 



The noise matrix Jx can be estimated from the measurement process: in the simplest 
case where noise is due to shot-noise only and the windows Ai are non-overlapping, Af oc o\ I 
where <7j = <Ji n tl ^fN~i is the shot noise, given the intrinsic ellipticity cij n t and the number of 
galaxies iVj in the bin described by A{. 

Once the theoretical correlation matrix is computed, the KL basis can be determined 
using linear algebraic methods, and the basis functions can be employed in any of the variety 
of ways suggested above. In this work, we will explore three applications of this procedure. 
In Chapter 3, we will use the properties of the theoretical covariance as a basis for a filter 
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for constructing 3D mass maps from weak lensing shear observations. In Chapter 4, the KL 
modes will be used to correct for masking in a realistic shear map, and the resulting maps 
will be analyzed using shear peak statistics. In Chapter 5, the KL modes will be used for 
cosmological parameter estimation using shear catalogs from the COSMOS weak lensing 
survey. In all three cases, the ability to theoretically compute the covariance matrix of the 
observations leads to an analysis optimally tuned to the noise properties of the observed 
signal. 
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Chapter 3 

3D WEAK LENSING MAPS WITH KL 



The material in this chapter is adapted from VanderPlas et al. (2011 ). In it, we present a 



new method for constructing three-dimensional mass maps from gravitational lensing shear 
data. We solve the lensing inversion problem using truncation of singular values (within the 
context of generalized least squares estimation) without a priori assumptions about the sta- 
tistical nature of the signal. This singular value framework allows a quantitative comparison 
between different filtering methods: we evaluate our method beside the previously explored 
Wiener filter approaches. Our method yields near-optimal angular resolution of the lensing 
reconstruction and allows cluster sized halos to be de-blended robustly. It allows for mass 
reconstructions which are 2-3 orders-of-magnitude faster than the Wiener filter approach; 
in particular, we estimate that an all-sky reconstruction with arcminute resolution could be 
performed on a time-scale of hours on a current workstation. We find however that linear, 
non-parametric reconstructions have a fundamental limitation in the resolution achieved in 
the redshift direction. 

This chapter was originally published in collaboration with Andrew Connolly, Bhuvnesh 



Jain, and Mike Jarvis in the February 2011 edition of the Astronomical Journal (VanderPlas 



et al. 2011 ApJ, Vol. 727, p. 118; © 2011 by the American Astronomical Society) and is 
reproduced below with permission of the American Astronomical Society. 



3.1 Introduction 



Taylor| ( |200l[ ), |Hu & Keeton| ( |2002[ hereafter HK02) and |Bacon & Taylor| fl2003| ) first looked 
at non-parametric 3D mapping of a gravitational potential. HK02 presented a linear- 
algebraic method for tomographic mapping of the matter distribution - splitting the sources 
and lenses into discrete planes in redshift. They found that the inversion along each line-of- 
sight is ill-conditioned, and requires regularization through Wiener filtering. Wiener filtering 
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reduces reconstruction noise by using the expected statistical properties of the signal as a 



prior: for the present problem, this prior is the nonlinear mass power spectrum. Simon 



et al. (2009, hereafter STH09) made important advances to this method by constructing an 
efficient framework in which the inversions for every line-of-sight are computed simultane- 
ously, allowing for greater flexibility in the type of filter used. They introduced two types 
of Wiener filters: a "radial Wiener filter" , based on the HK02 method, and a "transverse 
Wiener filter" , based on the Limber approximation to the 3D mass power spectrum. They 
showed that the use of a generalized form of either filter leads to a biased result - the filtered 
reconstruction of the line-of-sight matter distribution for a localized lensing mass is both 
shifted and spread-out in redshift. 

One issue with the Wiener filter approach is the assumption of Gaussian statistics in 
the reconstructed signal. In reality, the matter distribution at relevant scales can be highly 
non-Gaussian. It is possible that the redshift bias found in STH09 is not inherent to 
nonparametric linear mapping, but rather a result of this deficiency in the Wiener filtering 
method. 

In this chapter, we develop an alternate noise-suppression scheme for tomographic map- 
ping that, unlike Wiener filtering, has no dependence on assumptions about the signal. 
Our goal is to explore improvements in the reconstruction and examine, in particular, the 



recovery of redshift information using the different methods. We begin in Section 3.2 by 



discussing the tomographic weak-lensing model developed by HK02 and STH09 and pre- 



senting our estimator for the density parameter, 5. In Sections 3.3 and 3.4 we implement 
this method for a simple case, and compare the results with those of the STH09 transverse 
and radial Wiener filters. 

3.2 Method 

For tomographic weak lensing, we are concerned with three quantities: the complex-valued 
shear 7(0,2), the real-valued convergence k(6,z), and the dimensionless density parameter 



5(6, z). As discussed in £1.7, the relationship between 7 and k is given by a convolution 
over all angles 6, and the density 5 is related to k by a line-of-sight integral over the lensing 
efficiency function, W(z,z s ). The key observation is that in the weak lensing regime, each 
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of these operations is linear: if the variables are discretized, they become systems of linear 
equations, which can in principle be solved using standard matrix methods. 

3.2.1 Linear Mapping 



Details of the weak lensing formalism are covered in £ 1.7.1 Here we'll briefly review the 
most relevant pieces. To compute 3D mass maps with weak lensing, we begin by creating 
a common pixel binning of the observed field of N x by N y equally sized square pixels of 
angular width A9 X = A9 y . Within each of the N x N y = N xy individual lines of sight, we bin 
7 into N s source-planes, and bin 5 into Ni lens-planes, Ni < N s . Thus we have two ID data 
vectors, which are concatenations of the line-of-sight vectors within each pixel: 7, of length 
N xy N s ; and 8, of length N xy N[. (Note that throughout this section, boldface denotes a 
vector quantity.) As a result of this binning, we can write the discretized lensing equations 
in a particularly simple form: 

7 = M 7<5 <5 + n 7 (3.1) 

where 7 is the vector of binned shear observations with noise given by n 7 , and 8 is the 
vector of binned density parameter. For details on the form of the matrix M~g, refer to 
Appendix [Bj 

The linear estimator 8 of the signal is found by minimizing the quantity 

X 2 = (7 - M 7 ^) t AT 77 - 1 ( 7 - M lS 8) (3.2) 

where f indicates the conjugate transpose, and A/^y 7 = (n 7 n 7 ^) is the noise covariance of 
the measurement 7, and we assume (n 7 ) = 0. The best linear unbiased estimator for this 



case is due to Aitken (1934): 

-1 



8a 



M^<Mrr~ lM i* M 7(5 t A/' 77 _1 7 (3.3) 



The noise properties of this estimator can be made clear by defining the matrix M-yS = 
A/' 77 _1 ^ 2 Ai 7 5 and computing the singular value decomposition (SVD) M^s = VSV^ . Here 
U j U = V j V = I and E is the square diagonal matrix of singular values a{ = Ejj, ordered 
such that crj > <7j + i, i > 1. Using these properties, the Aitken estimator can be equivalently 
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written 

S A = VS- 1 J7t A ^ y7 -i/2 7 (3.4) 

It is apparent in this expression that the presence of small singular values cxj <C o\ can lead 
to extremely large diagonal entries in the matrix X - 1 , which in turn amplify the errors in 
the estimator 8 a- This can be seen formally by expressing the noise covariance in terms of 
the components of the SVD: 

M SS = VE~ 2 Vl (3.5) 

This makes clear the connection with KL, as discussed in Chapter 2. The columns of the 
matrix V are eigenvectors of Afss, with eigenvalues cr~ 2 . When many small singular values 
are present, the noise will dominate the reconstruction, and it is necessary to use a more 
sophisticated estimator to recover the signal. 

3.2.2 KL Filtering 

One strategy that can be used to reduce this noise is to add a penalty function to the \ 2 
that will suppress the large spikes in signal. This is the Wiener filter approach explored by 
HK02 and STH09. A more direct noise-reduction method, which does not require knowledge 



of the statistical properties of the signal, involves approximating the SVD in Equation 3.4 
to remove the contribution of the high- noise modes. We choose a cutoff value <x cu t, and 
determine n such that a n > a cu t > cr n +i- We then define the truncated matrices U n , T, n , 
and Vn, such that U n (V n ) contains the first n columns of U (V), and S n is a diagonal 
matrix of the largest n singular values, n < n max . To the extent that a^ ut <C Ya=i °f ' ^ e 
truncated matrices satisfy 

UAVj « *7£Vt = M^s (3.6) 



and the signal estimator in Equation 3.3 can be approximated by the SVD estimator: 

Wn) = Vn^UlM^- 1 ' 2 ! (3.7) 
This approximation is optimal in the sense that it preferentially eliminates high-noise or- 



thogonal components in 8 (cf. equation 3.5), leading to an estimator which is much more 
robust to noise in 7. 
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SVDs are often used in the context of KL or Principal Component Analysis, where 
the square of the singular value is equal to the variance described by the corresponding 



principal component (see £2.3.5). The variance can be thought of, roughly, as a measure of 
the information contributed by the vector to the matrix in question. It will be useful for 
us to think about SVD truncation in this way. To that end, we define a measure of the 



truncated variance for a given value of n: 



-cut(n) = 1 - fkA (3-8) 

such that < v cnt < 1- If v cn t = 0, then n = n max and we are using the full Aitken 
estimator. As v cu t — > 1, we are increasing the amount of truncation. 

In practice, taking the SVD of the transformation matrix A/^ 7 M^s is not entirely 
straightforward: the matrix is of size (N xy N s ) x (N xy Ni). With a 128 x 128-pixel field, 20 
lens-planes, and 25 source-planes, the matrix contains 1.3 x 10 11 mostly nonzero complex 
entries, amounting to 2TB in memory (double precision). Computing the SVD for a non- 
sparse matrix of this size is far from trivial. 

We have developed a technique to speed-up this process, which involves decomposing 
the matrices A/" 77 and M^s into tensor products, so that the full SVD can be determined 
through computing SVDs of two smaller matrices: an N s x JVj matrix, and an N xy x N xy 
matrix. The second of these individual SVDs can be approximated using the Fourier-space 
properties of the 7 — > k mapping. The result is that the entire SVD estimator can be 
computed very quickly. The details of this method are described in Appendix |Bj 

3.3 Results 

Using the above formalism, we can now explore the tomographic weak lensing problem 



using the techniques of Section 3.2 For the following discussion, we will use a field of 
approximately one square degree: a 64 x 64 grid of V x V pixels, with 25 source redshift bins 
(0 < z < 2.0, Az = 0.08) and 20 lens redshift bins (0 < z < 2.0, Az = 0.1). This binning 
approximates the expected photometric redshift errors of future surveys. We suppress edge 
effects by increasing the noise of all pixels within 4' of the field border by a factor of 10 3 , 
effectively deweighting the signal in these pixels (cf. STH09). The noise for each redshift 
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M s Signular Values 
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Figure 3.1: Ordered singular values of the matrix M-yS- The dotted lines show the values 
of n such that 99%, 99.9%, and 99.99% of the variance is preserved. The sharp drop-off 
near n = 60, 000 is due to the 10 -3 deweighting of border pixels. 



bin is set to n« = a~/y/Nl, where <r 7 is the intrinsic ellipticity dispersion, and iVj is the 
number of galaxies in the bin. We assume <r 7 = 0.3 (based on the Hubble Deep Field image 



Mellier, 1999), and 70 galaxies per square arcminute, with a redshift distribution given by 



n{z) oc z exp 



( z / z o) 



3/2 



(3.9) 



with zq = 0.57. We assume a flat cosmology with h = 0.7, Qm = 0.27 and S7a = 0.73 at the 
present day. 

3.3.1 Singular Values 

The singular values of the transformation matrix for this configuration are depicted in 



Figure 3.1 The step pattern visible in this plot is due to the fact that the noise across 
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each source plane is identical, aside from the 4' deweighted border. It is apparent from this 
figure that the large majority of the singular values are very small: 99.9% of the variance in 
the transformation is contained in less than 1/3 of the singular values. The large number of 



very small singular values will, therefore, dominate in the Aitken estimator (Equation 3.4), 
leading to the very noisy unfiltered results seen in HK02. 



3.3.2 Evaluation of the SVD Estimator 

To evaluate the performance of the SVD filter, we first create a field-of-view containing a 
single halo at redshift z = 0.6. One well-supported parametrization of halo shapes is the 



NFW profile (Navarro et al. 1997). We use the analytic form of the shear and projected 



density due to an NFW profile, given by equations 13-18 in Takada k, Jain (2003). 



We reconstruct the density map using the SVD filter (Figure 3.2) with the above survey 
parameters. We show the results for three different values of v cut : 0.1, 0.01, and 0.005. 
In all three cases, the halo is easily detected at its correct location (left panels), although 
as v cu t decreases, there is more noise in the surrounding field. The right panels show the 
computed density profile along the line of sight for the central pixel. The peak of this curve 
is close to the correct redshift, but there is a significant spread in redshift, as well as a bias. 
As the level of SVD filtering (measured by v cut ) decreases, the magnitude of these effects 
decreases, but the increased noise leads to spurious peaks. 

Similar plots for the transverse Wiener filter recommended by STH09 are shown in the 
upper panels of Figure |3.3[ using their recommended value of a = 0.05. The response 
shows a significant spread in angular space, and the signal is seen to be suppressed by six 
orders-of-magnitude along with a similar suppression of the noise. These effects worsen, in 
general, as the filtering level a increases. Mathematically it is apparent why the transverse 
filter performs so poorly: the small singular values primarily come from the line-of-sight 
part of the mapping, and the this filter has no effect along the line-of-sight. 



The effect of the radial Wiener filter is shown in the bottom panels of Figure 3.3 It 
shares the positive aspects of the SVD filter, having very little signal suppression or angular 
spread. However, this filter uses some priors on the statistical form of the signal that 
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Figure 3.2: The effect of SVD truncation on a single z = 0.6 NFW halo in the center of 
the field, for three different levels of filtering, left column: reconstructed density parameter 
6(0) in the z = 0.6 lens-plane. The true matter distribution is represented by a tight "dot" 
in the center of the plot, right column: line-of-sight profile at the central pixel. The gray 
shaded area shows the input density parameter. The solid line shows the E-mode signal, 
while the dashed line shows the B-mode signal, n gives the number of singular values used 
in the reconstruction (out of a total n max = 81920), and t> cu t gives the amount of variance 



cut by the truncation (Equation 3.8); the level of filtration decreases from the top panels 
to the bottom panels. The bottom panels show a case of under-filtering: for small enough 
v cut , the noise overwhelms the signal and creates spurious peaks along the line-of-sight. 



58 



are not as physically well- motivated as those for the transverse Wiener filter. In contrast, 
the SVD filter does not make any prior assumptions about the signal. In this way, the 
SVD reconstruction can be thought of as even more non-parametric than the Wiener filter 
reconstructions . 



3.3.3 Comparison of Estimators 



The SVD framework laid out in Section 3.2.2 can be used to quantitatively compare the 



behavior of different estimators. A general linear estimator has the form 

Sr = Ri (3.10) 
for some matrix R. This general estimator can be expressed in terms of the components of 



the unbiased estimator (Equation 3.4): 



R=V R 'E- 1 UW^~ 1/2 . (3.11) 



Here the matrices XI, U and A/* 77 are defined as in Equation 3.4, and we have defined the 
matrix 

V R = RAf^^WE (3.12) 

The rows of the matrix Tl~ l U^ A/^-y -1 / 2 provide a convenient basis in which to work: they are 
the weighted principal components of the shear, ordered with decreasing signal to noise. The 
norm of the z th column of Vr measures the contribution of the i th mode to the reconstruction 
of 6. For the unfiltered estimator, Vr = V and all the norms are unity. This leads to a very 



intuitive comparison between different filtering schemes. Figure 3.4 compares the column- 



norms of Vr for the SVD filter with those of the radial and transverse Wiener filters. 



The steps visible in the plot originate the same way as the steps in Figure 3.1 the flatness 
of each step comes from the assumption of uniform noise in each source plane. This plot 
shows the tradeoff between noise and bias. The flat line at norm=10° represents a noisy but 
unbiased estimator. Any departure from this will impose a bias, but can increase signal-to- 
noise. There are two important observations from this figure. First, because each step on the 
plot is relatively flat for the SVD filter and radial Wiener filter, we don't expect much bias 



59 



within each lens plane. The transverse filter, on the other hand, has fluctuations at the 10% 



level within each step (visible in the inset of Figure 3.4), which will lead to a noticeable bias 



within each lens plane, resulting in the degraded angular resolution of the reconstruction 



seen in Figure 3.3 Second, the transverse Wiener filter deweights even the highest signal- 



to-noise modes by many orders of magnitude, resulting in the signal suppression seen in 



Figure 3.3. The SVD filter and radial Wiener filter, on the other hand, have weights near 
unity for the highest signal-to-noise modes. These two observations show why the SVD filter 
and radial Wiener filter are the more successful noise reduction techniques for the present 
problem. 

3.3.4 Noise Properties of Line- of- Sight Modes 



As seen in equation 3.5 the columns of V provide a natural orthogonal basis in which to 
express the signal S. It should be emphasized that this eigenbasis is valid for any linear 
filtering scheme: the untruncated SVD is simply an equivalent re-expression of the orig- 
inal transformation. Examining the characteristics of these eigenmodes can yield insight 
regardless of the filtering method used. 



The radial components of the first four eigenmodes are plotted in figure 3.5 Each is 
labeled by its normalized noise level, rtj = {o~i/ o~\)~ l . The total number of modes will be 
equal to the number of output redshift bins; here, for clarity, we've used 80 equally-spaced 
bins out to redshift 2.0. As the resolution is lessened, the overall shape and relative noise 
level of the lower-order modes is maintained. These radial modes are analogous to angular 
Fourier modes, and are related to the signal-to-noise KL modes discussed in HK02. It is 
clear from this plot that any linear, non-parametric estimator will be fundamentally limited 
in its redshift resolution: the noise level of the i th mode approximately scales as 

rii 5c i 2 (3.13) 

The signal-to- noise level for any particular halo will depend on its mass and redshift. The 
magnitude of the signal scales linearly with mass (see discussion in STH09), but the redshift 
dependence is more complicated: it is affected by the lensing efficiency function, which 
depends on the redshift of the lensed galaxies. Using the above survey parameters, with 
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an NFW halo of mass M200 = 10 15 M Q and redshift z = 0.6, the signal-to-noise ratio of the 
central pixel for the fundamental radial mode is ~ 5.9, consistent with the results for Wiener 
filtered reconstructions of singular isothermal halos explored in STH09. This means that 
for even the largest halos, with a very deep survey, only the first few modes will contribute 
significantly to the reconstructed halo. Adding higher-order modes can in theory provide 
redshift information, but at the cost of increasingly high noise contamination. This is a 
general result which will apply to all nonpar ametric linear reconstruction algorithms. 

This lack of information in the redshift direction leads directly to an inability to ac- 
curately determine halo masses: the lensing equations relate observed shear 7 to density 
parameter 6, which is related to mass in a redshift-dependent way. This is a fundamental 
limitation on the ability of linear nonparametric methods to determine halo masses from 
shear data. Indeed, even moving to fully parametric models, line-of-sight effects can lead 



to halo mass errors of 20% or more (Hoekstra, 2003 de Putter & White 2005). 



3.3.5 Reconstruction of a Realistic Field 



To compare the performance of the three filtering methods for a realistic field, we create a 4 





square degree field with approximately 20 halos between masses of 2 x 10 14 and 8 x 10 14 M G 



with a mass distribution approximating the cluster mass function of Rines et al. (2007) 



and a redshift distribution given by Equation 3.9 adding a hard cutoff at z = 1.0. These 



parameters are chosen to approximate the true distribution of observable halos in a field 



this size. The results of the reconstruction are shown in Figure 3.6 

The red circles are the locations of the input halos, not the result of some halo-detection 
algorithm. However, it is clear that, for at least most of the mass range, we are able to 
produce a map for which any reasonable detection algorithm should detect the halos in the 
correct locations. A few of the lower mass halos would certainly be missed though, since 
they are not significantly different from the noise peaks in the image. 



In practice, one may vary the parameter v cut as in Figure |3.2| to trade-off robustness 

we 



of detecting peaks with resolution in angle and in redshift. As shown in Section 3.3.3 



expect filtering to introduce very little bias in angular resolution, so large values of v cnt 
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lead to the most robust angular results. On the other hand, as shown in Section 3.3.4 



filtering introduces an extreme bias along the line-of-sight. The effects of this bias can be 



seen qualitatively in the right column of Figure 3.2 Optimal redshift resolution requires 
choosing a filtering level which balances the effects of noise and bias, and may require some 
form of bias correction. In future work, we will explore in detail the ways in which the SVD 
method allows for a near optimal reconstruction of projected mass maps and halo redshifts 
from data on galaxy shapes and photometric redshifts. 



3.3.6 Scalability 

As we look forward to future surveys, it becomes important to consider methods that will 
scale upward with increasing survey volumes. Present weak lensing surveys cover fields on 



the order of a few square degrees (e.g. COSMOS, Massey et al. 2007). Future surveys 



will increase the field size exponentially: up to ~ 20, 000 square degrees for LSST (LSST 



Science Collaborations et al. , 2009 ) . Though the flat-sky approximation used in this work 



is not appropriate for such large survey areas, the weak lensing formalism can be modified 



to account for spherical geometry (see, e.g. Heavens 2003) 



The main computational cost for both SVD and Wiener filtering is the Fast Fourier 
Transform (FFT) required to implement the mapping from 7 to k. For an N x N pixel field, 
the FFT algorithm performs in 0[N log N] in each dimension, meaning that the 2D FFT 
takes 0[(N log N) 2 ] ~ 0[N 2 }. The Wiener filter method, however, requires the inversion 
of a very large matrix using, for example, a conjugate- gradient method. The exact number 
of iterations depends highly on the condition number of the matrix to be inverted; STH09 
finds that up to 150 iterations are required for this problem. We find that each iteration 
takes over 3 times longer than the entire SVD reconstruction. The net result is that both 
algorithms scale nearly linearly with the area of the field (for constant pixel scale), though 
the SVD estimator is computed up to 500 times faster than the Wiener filter. 

Extrapolating this scaling, the appropriately scaled SVD filter will allow reconstruction 
of the entire ~20, 000 square-degree LSST field in a few hours on a single ~GHz processor, 
given enough memory. On the same computer, the Wiener-filter method would take over 
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Table 3.1: Masses and redshifts of halos in Figure 3.6 



M/Mq 



A 


37.5 


44.9 


0.60 


7 


2 x 10 


B 


67.1 


70.5 


0.47 


6 


x 10 14 


C 


46.9 


106.5 


0.63 


5 


5 x 10 


D 


108.9 


94.3 


0.63 


5 


4 x 10 


E 


97.9 


63.6 


0.39 


4 


9 x 10 


F 


102.4 


84.8 


0.70 


3 


8 x 10 


G 


77.0 


49.6 


0.58 


3 


2 x 10 


H 


52.0 


48.5 


0.36 


3 


2 x 10 


I 


72.6 


45.6 


0.78 


2 


9 x 10 


J 


68.6 


64.5 


0.68 


2 


5 x 10 


K 


8.6 


34.5 


0.32 


2 


3 x 10 


L 


10.5 


49.5 


0.51 


2 


3 x 10 


M 


99.4 


56.5 


0.22 


2 


3 x 10 


N 


21.7 


53.1 


0.76 


2 


3 x 10 





31.6 


102.1 


0.69 


2 


2 x 10 


P 


69.7 


33.2 


0.39 


2 


2 x 10 



a month, depending on the amount and type of filtering and assuming that the required 
number of iterations stays constant with increasing field size. For the SVD-filtered recon- 
struction of this large field, the real challenge will not be computational time, but memory 
constraints: the complex shear vector itself for such a field will require ~30 GB of memory, 
with the entire algorithm consuming approximately three times this. The memory require- 
ments for the Wiener filter will be comparable. This is within reach of current high-end 
workstations as well as shared- memory parallel clusters. 
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3.4 Conclusion 

We have presented a new method for producing tomographic maps of dark matter through 
weak lensing, using truncation of singular values. We have tested and compared our method 
to the Wiener filter based method of STH09, which is the first three-dimensional mass 
mapping approach that is applicable to large area surveys. Our reconstruction shares many 
of the aspects of the Wiener filter reconstruction, in the sense that it massively reduces 
the noise inherent in the problem. Our SVD method may be considered even more non- 
parametric than the Wiener filter method, since it does not rely on any a priori assumptions 
of the statistical properties of the signal: all of the noise reduction is derived from the 
observed noise properties of the data. 

The SVD framework allows a unique quantitative comparison between the different 
filtering methods and filtering strengths. Using the coefficients of the weighted principal 
components contained in the SVD, we have compared the three filtering methods, and have 
found that the radial Wiener filter of HK02 and SVD filter of this work are less-biased 
noise reduction techniques than the transverse Wiener filter of STH09. These authors have 
recently implemented the radial Wiener filter and obtain results consistent with our findings 
(P. Simon and A. Taylor, private communication). 

The angular resolution of the SVD-reconstructed mass maps seems to be significantly 
better than that of the transverse Wiener filter method, the method chosen in the STH09 
analysis. This allows for more robust separation of pairs of halos into two separate halos 
rather than blurring them into a single mass peak. We discuss how our reconstruction 
method provides a scheme for optimizing the 3D reconstruction of projected mass maps 
by balancing the goals of robustness of detecting specific structures and improved redshift 
resolution. 

The SVD method can compute the three-dimensional mass maps rapidly provided suf- 
ficient computational memory is available. This allows for the possibility of solving the 
full-sky tomographic lensing inversion on the scale of hours, rather than months, which 
makes it readily applicable to upcoming surveys. 

On the other hand, the redshift resolution with the SVD method is not significantly bet- 
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ter than that of either Wiener filter method. This was a problem identified by STH09, and 
unfortunately the SVD method does not significantly improve the situation. Our analysis of 
the noise characteristics of radial modes indicates that linear, non-parametric reconstruction 
methods are fundamentally limited in this regard. 
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Figure 3.3: The effect of Wiener filtering on the same input as Figure 3.2 Here we 
have used both transverse (top panels) and radial (bottom panels) Wiener filtering, both 
down-tuned by a = 0.05 (the value recommended by STH09). The transverse Wiener filter 
suppresses the response by several orders of magnitude; a closer view of the line-of-sight 
peak is shown in the inset plot. The radial Wiener filter gives similar angular results to the 
SVD filter, but takes much longer to compute. 
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Figure 3.4: Contribution of each shear mode to the reconstruction for three different filters. 
The dotted line at 10° represents the unfiltered result. Each filtering method leads to a 
different weighting of the shear modes. The SVD filter, by design, completely removes 
higher-order modes beyond a given cutoff, while the Wiener filter deweights modes in a 
more gradual fashion. Note that the transverse Wiener filter deweights all modes by up to 
seven orders of magnitude; it has been scaled by a factor of 10 4 for this plot. The inset 
plot shows a closeup of the fluctuations within each "step" of the transverse filter. These 



fluctuations lead to angular spread in the response (see discussion in Section 3.3.3) 
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Figure 3.5: left panel: The radial components of the first four columns of the matrix V (see 
section 



3.2.1). This is calculated for 100 equally spaced redshift bins (0 < z < 2.5) in 7, 



and 80 bins (0 < z < 2.0) in S These orthogonal eigenmodes are analogous to radial Fourier 
modes. Each is labeled by its relative noise level, = (o^/cri) -1 . right panel: The singular 
values G{ associated with the 80 radial eigenmodes. 
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Figure 3.6: Reconstruction of an artificial shear field with the SVD filter (top panels), 
Transverse Wiener filter (middle panels), and Radial Wiener filter (bottom panels). The 
left column shows the projected density reconstruction across the field using each method, 
all smoothed with a 1-pixel wide Gaussian filter. Red circles indicate the true locations 
of the input halos. The right column shows the line-of-sight distributions of the twelve 
most massive NFW halos, labeled A-L. The masses and redshifts of the halos are listed 
in Table 3.1 The signal suppression of the transverse Wiener filter seen in Figure 3.3 is 
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Chapter 4 

SHEAR PEAK STATISTICS WITH KL 



This chapter will cover the results from VanderPlas et al. (2012). In it, we explore 
the utility of Karhunen Loeve (KL) analysis in solving practical problems in the analysis 
of gravitational shear surveys, with a specific application to cosmological constraints from 
shear peak statistics. Shear catalogs from large-field weak lensing surveys will be subject to 
many systematic limitations, notably incomplete coverage and pixel-level masking due to 
foreground sources. We develop a method to use two dimensional KL eigenmodes of shear 
to interpolate noisy shear measurements across masked regions. We explore the results of 
this method with simulated shear catalogs, using statistics of high-convergence regions in 
the resulting map. We find that the KL procedure not only minimizes the bias due to 
masked regions in the field, it also reduces spurious peak counts from shape noise by a 
factor of ~ 3 in the cosmologically sensitive regime. This indicates that KL reconstructions 
of masked shear are not only useful for creating robust convergence maps from masked shear 
catalogs, but also offer promise of improved parameter constraints within studies of shear 
peak statistics. 

This chapter was originally published in collaboration with Andrew Connolly, Bhuvnesh 



Jain, and Mike Jarvis in the January 2012 edition of the Astronomical Journal (VanderPlas 



et al. 2012, ApJ, Vol. 744, p. 180; © 2012 by the American Astronomical Society) and is 



reproduced below with permission of the American Astronomical Society. 



4.1 Introduction 



Currently, a new generation of wide-field weak lensing surveys are in the planning and 
construction stages. Among them are the the Dark Energy Survey (DES), the Panoramic 
Survey Telescope &: Rapid Response System (PanSTARRS), the Wide Field Infrared Survey 
Telescope (WFIRST), and the Large Synoptic Survey Telescope (LSST), to name a few. 
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These surveys, though not as deep as small-field space-based lensing surveys, will cover 
orders-of-magnitude more area on the sky: up to ~ 20, 000 square degrees in the case of 
LSST. 

This is a fundamentally different regime than early weak lensing reconstructions of single 
massive clusters: the strength of the shear signal is only ~ 1%, and is dominated by 
~ 30% intrinsic shape noise. This, combined with source galaxy densities of only n ~ 
20 — 50 arcmin -2 (compared with n > 100 arcmin -2 for deep, space-based surveys) and 
atmospheric PSF effects leads to a situation where the signal is very small compared to the 
noise. Additionally, in the wide-field regime, the above-mentioned priors cannot be used. 
Nevertheless, many methods have been developed to extract useful information from wide- 
field cosmic shear surveys, including measuring the N-point power spectra and correlation 



functions (Schneider et al. 2002a Takada & Jain, 2004 Hikage et al. 2010), performing log 



transforms of the convergence field (Neyrinck et al. 2009, 2010 Scherrer et al. 2010, Seo 



et al. 2011), analyzing statistics of convergence and aperture mass peaks (Marian et al. 



2010, 


Dietrich & Hartlap, 2010; Schmidt & Rozo, 


2010; 


Kratochvil et al. , 2010 ; Maturi et al. 



2011 ). Another well- motivated application of wide- field weak lensing is using wide-field mass 
reconstructions to minimize the effect mass-sheet degeneracy in halo mass determination. 
Many of the above applications require reliable recovery of the projected density, either 



in the form of the convergence k, or filter-based quantities such as aperture mass (Schneider 



et al. 1998). Because each of these amounts to a non-local filtering of the shear, the 
presence of masked regions can lead to a bias across significant portions of the resulting 
maps. Many of these methods have been demonstrated only within the context of idealized 
surveys, with exploration of the complications of real-world survey geometry left for future 
study. Correction for masked pixels has been studied within the context of shear power 



spectra (Schneider et al. 2010 Hikage et al. 2010) but has not yet been systematically 



addressed within the context of mapmaking and the associated statistical methods (see, 



Padmanabhan et al. 


2003 


Pires et al. 


2009 



propose to address this missing data problem through Karhunen-Loeve (KL) analysis. 



In Section 4.2 we summarize the theory of KL analysis in the context of shear measure- 
ments, including the use of KL for interpolation across masked regions of the observed field. 
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In Section 4.3 we show the shear eigenmodes for a particular choice of survey geometry, and 



use these eigenmodes to interpolate across an artificially masked region in a simulated shear 



catalog. In Section 4.4 we discuss the nascent field of "shear peak statistics", the study of 
the properties of projected density peaks, and propose this as a test of the possible bias 



imposed by KL analysis of shear. In Section 4.5 we utilize simulated shear catalogs in order 
to test the effect of KL interpolation on the statistics of shear peaks. 

4.2 Karhunen-Loeve Analysis of Shear 

As discussed in Chapter 2, KL analysis is a commonly used statistical tool in a broad range 



of astronomical applications, from, e.g. studies of galaxy and quasar spectra ( Connolly et al. 



1995; Connolly &; Szalay 1999 Yip et al. , 2004afb ), to analysis of the spatial distribution of 



galaxies (Vogclcy & Szalay, 1996 Matsubara et al., 2000, Pope et al. 2004), to characteri 



zation of the expected errors in weak lensing surveys ( Kilbinger & Munshi , 2006 ; Munshi Sz 



Kilbinger, 2006). A full description of KL analysis is presented in Chapter 2; here we will 



briefly review the points relevant to this chapter. 

In general, any set of iV-dimensional data can be represented as a sum of N orthogonal 
basis functions: this amounts to a rotation and scaling of the TV-dimensional coordinate 
axis spanning the space in which the data live. KL analysis seeks a set of orthonormal basis 
functions which can optimally represent the dataset. The sense in which the KL basis is 
optimal will be discussed below. For the current work, the data we wish to represent are the 
observed gravitational shear measurements across the sky. We will divide the survey area 
into iV discrete cells, at locations Xi, 1 < i < N. From the ellipticity of the galaxies within 
each cell, we infer the observed shear j°(xi), which we assume to be a linear combination 
of the true underlying shear, ^y(xi) and the shape noise n 7 (ajj)j^] In general, the cells may 
be of any shape (even overlapping) and may also take into account the redshift of sources. 
In this analysis, the cells will be square pixels across the locally flat shear field, with no use 
of source redshift information. For notational clarity, we will represent quantities with a 
vector notation, denoted by bold face: i.e. 7 = [71,72 • • -] T ; 7« = 



1 Throughout this chapter, we assume we are in the regime where the convergence k < 1 so that the 
average observed ellipticity in a cell is an unbiased estimator of shear; see Bartelmann & Schneider (2001 1 
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4-2.1 KL Formalism 

As discussed in Chapter 2, KL analysis provides an optimal framework such that our mea- 
surements 7 can be expanded in a set of N orthonormal basis functions {^(ajj), j = 1, TV}, 
via a vector of coefficients a. In matrix form, the KL projection of the observed shear can 
be written 

7 = *a (4.1) 

where the columns of the matrix \I/ are the basis vectors Orthonormality is given by 
the condition ^\^fj = so that the coefficients can be determined by 

a = * f 7 (4.2) 

A KL decomposition is optimal in the sense that it seeks basis functions for which the 
coefficients are statistically orthogonalj^] that is, they satisfy 

(a*aj) = {al)8ij (4.3) 

where angled braces (• • •) denote averaging over all realizations. This definition leads to 



several important properties (see Vogeley & Szalay, 1996, for a thorough discussion &; 
derivation) : 

1. KL as an Eigenvalue Problem: Defining the correlation matrix ^ = {jijj), it 
can be shown that the KL vectors are eigenvectors of £ with eigenvalues Aj = (af). 
For clarity, we'll order the eigenbasis such that Aj > Aj + i V % E (1, N — 1). We define 
the diagonal matrix of eigenvalues A, such that Ajj = \i5ij and write the eigenvalue 
decomposition in compact form: 

g, = *A* f (4.4) 

2. KL as a Ranking of Signal-to-Noise It can be shown that KL vectors of a whitened 



covariance matrix (see Section 4.2.2) diagonalize both the signal and the noise of the 
problem, with the signal-to-noise ratio proportional to the eigenvalue. This is why 
KL modes are often called "Signal-to- noise eigenmodes". 

2 Note that statistical orthogonality of coefficients is conceptually distinct from the geometric orthogonality 



of the basis functions themselves; see Vogeley & Szalay (JT996J) for a discussion of this property. 
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3. KL as an Optimal Low-dimensional Representation: An important conse- 
quence of the signal-to-noise properties of KL modes is that the optimal rank-ra rep- 
resentation of the data is contained in the KL vectors corresponding to the n largest 
eigenvalues: that is, 

n<N 

7 (n) = E a **< ( 4 - 5 ) 

1=1 

minimizes the reconstruction error between 7^ and 7 for reconstructions using n or- 
thogonal basis vectors. This is the theoretical basis of Principal Component Analysis 
(sometimes called Discrete KL), and leads to a common application of KL decom- 
position: filtration of noisy signals. For notational compactness, we will define the 
truncated eigenbasis &( n ) and truncated vector of coefficients at n ) such that Equa- 
tion 4.5 can be written in matrix form: 7^ = Sbr n \ai n y 



4-2.2 KL in the Presence of Noise 

When noise is present in the data, the above properties do not necessarily hold. To sat- 
isfy the statistical orthogonality of the KL coefficients a and the resulting signal-to-noise 
properties of the KL eigenmodes, it is essential that the noise in the covariance matrix be 
"white": that is, A/" 7 = (n 7 n 7 ) oc i". This can be accomplished through a judicious choice 
of binning, or by rescaling the covariance with a whitening transformation. We take the 
latter approach here. 

Defining the noise covariance matrix A/" 7 as above, the whitened covariance matrix can 
be written £ w = A/" 7 1 ^ 2 ^A/' 7 1//2 . Then the whitened KL modes become SI/vyA^/ 1 ]/]^ = £ w . 
The coefficients aw are calculated from the noise-weighted signal, that is 

a w = *\ v M- 1/2 { 1 + n 1 ) (4.6) 

For the whitened KL modes, if signal and noise are uncorrelated, this leads to (ai^aj^) = 
Avk + /: that is, the coefficients aw are statistically orthogonal. For the remainder of this 
work, we will drop the subscript 'V and assume all quantities to be those associated with 
the whitened covariance. 



74 



4-2.3 Computing the Shear Correlation Matrix 

The KL reconstruction of shear requires knowledge of the form of the pixel-to-pixel cor- 



relation matrix In many applications of KL (e.g. analysis of galaxy spectra, Connolly 



et al. 1995) this correlation matrix is determined empirically from many realizations of the 



data (i.e. the set of observed spectra). In the case of weak lensing shear, we generally don't 
have many realizations of the data, so this approach is not tenable. Instead, we compute 
this correlation matrix analytically. The correlation of the cosmic shear signal between two 
regions of the sky Ai and Aj is given by 



€ij = (lilj) + (run*) 

/ d 2 xi / d 2 Xji + (\xi - xj\ 

J Ai J An 



(77 



11 



(4.7) 



where a e is the intrinsic shape noise (typically assumed to be ~ 0.3), n is the average galaxy 



count per pixel, and £+(#) is the "+" shear correlation function (Schneider et al. 2002a). 
£+(#) can be expressed as an integral over the shear power spectrum: 



i 

2^ 



all tP 1 (i)J {£9) 



(4.8 



where Jo is the zeroth-order Bessel function of the first kind. The shear power spectrum 
Pj(£) can be expressed as an appropriately weighted line-of-sight integral over the 3D mass 



power spectrum (see, e.g. Takada & Jain, 2004): 



PJi) 



o V X 



(4.9) 



Here \ is t ne comoving distance, Xs is the distance to the source, and W(x) is the lensing 
weight function, 



«(x») v(V) - v 

W( X ) = "7"; u 7» A. / dzn(z) X( { - X (4.10) 



3^m,0^Q X 

2o(x) n g J z(x) • ' X {z 

where n(z) is the redshift distribution of galaxies. We assume a DES-like survey, where 
n{z) has the approximate form 



n(z) oc z l exp[— (z/zq) 1 ' 5 ] 



(4.11) 



with zq = 0.5, where n(z) is normalized to the observed galaxy density n g = 20 arcmin 2 . 
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The 3D mass power spectrum Pg(k,z) in Equation 4.9 can be computed theoretically. 



In this work we compute P$(k, z) using the halo model of Smith et al. (2003), and compute 



the correlation matrix £ using Equations 4.7 4.11 When computing the double integral 



of Equation 4.7, we calculate the integral in two separate regimes: for large separations 
(9 > 20 arcmin), we assume £+(0) doesn't change appreciably over the area of the pixels, 
so that only a single evaluation of the x+(0) is necessary for each pixel pair. For smaller 
separations, this approximation is insufficient, and we evaluate using a Monte-Carlo 
integration scheme. Having calculated the theoretical correlation matrix £ for a given field, 
we compute the KL basis directly using an eigenvalue decomposition. 

4-2-4 Which Shear Correlation? 

Above we note that the correlation matrix of the measured shear can be expressed in terms 
of the "+" correlation function, £+(#)■ This is not the only option for measurement of shear 



correlations (see, e.g. Schneider et al. 2002a). So why use £+(#) rather than £_(#)? The 



answer lies in the KL formalism itself. The KL basis of a quantity 7 is constructed via its 
correlation (77^). Because of the complex conjugation involved in this expression, the only 
relevant correlation function for KL is by definition. Nevertheless, one could object 

that by neglecting KL under-utilizes the theoretical information available about the 
correlations of cosmic shear. However, in the absence of noise, the two correlation functions 
contain identical information: either function can be determined from the other. In this 
sense, the above KL formalism uses all the shear correlation information that is available. 

One curious aspect of this formalism is that the theoretical covariance matrix and asso- 
ciated eigenmodes are real-valued, while the shear we are trying to reconstruct is complex- 
valued. This can be traced to the computation of the shear correlation: 

£+ = (77*) = (itlt) + (7x7x> +«[<7t7x) - (7x7t>] (4-12) 

By symmetry, the imaginary part of this expression is zero. At first glance, this might seem 
a bit strange: how can a complex-valued data vector be reconstructed from a real-valued 
orthogonal basis? The answer lies in the complex KL coefficients af though each KL mode 
contributes only a single phase across the field (given by the phase of the associated a«), the 
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reconstruction has a plurality of phases due to the varying magnitudes of the contributions 
at each pixel (given by the elements of each basis vector 

An important consequence of this observation is that the KL modes themselves are not 
sensitive by construction to the E-mode (curl-free) and B-mode (divergence-free) compo- 
nents of the shear field. As we will show below, however, the signal-to-noise properties of 
KL modes lead to some degree of sensitivity to the E and B-mode information in a given 



shear field (See Section 4.5). 



4-2.5 Interpolation using KL Modes 

Shear catalogs, in general, are an incomplete and inhomogeneous tracer of the underlying 
shear field, and some regions of the field may contain no shear information. This sparsity of 
data poses a problem, because the KL modes are no longer orthogonal over the incomplete 



field. Connolly & Szalay ( 1999 ) demonstrated how this missing-information problem can be 
addressed for KL decompositions of galaxy spectra. This application is discussed in more 
detail in Chapter 2; here we will briefly summarize the results using the notation of shear 
studies. First we define the weight function w(xi). The weight function can be defined in 
one of two ways: a binary weighting convention where w(xi) = in masked pixels and 1 
elsewhere, or a continuous weighting convention where w{xi) scales inversely with the noise 
[A/" 7 ]m. The binary weighting convention treats the noise is part of the data, and so the 



measurements should be whitened as outlined in Section 4.2.2 The continuous weighting 
convention assumes the noise is part of the mask, so data and noise are not whitened. We 
find that the two approaches lead to qualitatively similar results, and choose to use the 
binary weighting convention for the simplicity of comparing masked and unmasked cases. 

Let 7 be the observed data vector, which is unconstrained where w{xi) = 0. Then we 
can obtain the KL coefficients by minimizing the reconstruction error of the whitened 
data 

X 2 = (AT- 1/2 Y - * w a (n) )tw(A^- 1 /2 7 o _ * (f0O(n) ) (4.13) 
where we have defined the diagonal weight matrix Wij = w(xi)5ij. Minimizing Equa- 
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tion 4.13 with respect to a leads to the optimal estimator a, which can be expressed 



■ (n) (n) 

Where we have defined the mask convolution matrix M 



(4.14) 



(n) 



^\^W^ V n ). These coefficients 



(n) 



a( n -j can then be used to construct an estimator for the unmasked shear field: 



7 W =A^ /2 * (n) a (n) 



(4.15) 



In cases where the mask convolution matrix is singular or nearly singular, the esti- 



mator in Equation 4.15 can contain unrealistically large values within the reconstruction 



<y( n ) This can be addressed either by reducing n, or by adding a penalty function to the 



right side of Equation 4.13. One convenient form of this penalty is the generalized Wiener 



filter (see Tegmark, 1997), which penalizes results which deviate from the expected correla- 



tion matrix. Because the correlation matrix has already been computed when determining 
the KL modes, this filter requires very little extra computation. With Wiener filtering, 



Equation 4.13 becomes 



X 



(AT~ 1/2 ~f° - * (n) a (n) )tw(AT- 1 /V - * (ft) a (f0 ) 
+a at \C~} x a ( 



(4.16) 



V) a(n)"W 

where C a ( n ) = (a^a^) and a is a tuning parameter which lies in the range < a < 
1. Note that for a = 0, the result is the same as in the unfiltered case. Minimizing 



Equation |4. 16| with respect to a gives the filtered estimator 

Hn, a) =M-l a) *\ n) WM- 1 l 2 1 ° (4.17) 

where we have defined M( n>a } = [^Lw*^) + aA^J], and A( n ) is the truncated diagonal 
matrix of eigenvalues associated with ^( n )- 



4.3 Testing KL Reconstructions 



In this section we show results of the KL analysis of shear fields for a sample geometry. 
In Section 4.3.1 we discuss the general properties of shear KL modes for unmasked fields, 
while in Section 4.3.2 we discuss KL shear reconstruction in the presence of masking. 
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Figure 4.1: A sample of nine of the 4096 KL eigenmodes of a 1° x 1° patch of the sky 
partitioned into 64 x 64 pixels. Black is positive, red is negative, and each mode has 
unit norm. The modes are calculated from the theoretical shear correlation function (see 



Section 4.2.3). As a consequence of the isotropy of the cosmic shear field, the covariance 



matrix - and thus the associated eigenmodes - are purely real (see Section 4.3.1). 
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4-3.1 KL Decomposition of a Single Field 

To demonstrate the KL decomposition of a shear field, we assume a square field of size 
1° x 1°, divided into 64 x 64 pixels. We assume a source galaxy density of 20 arcmin~ 2 - 
appropriate for a ground-based survey such as DES - and calculate the KL basis following 



the method outlined in Section 4.2.1 For the computation of the nonlinear matter power 
spectrum, we assume a flat ACDM cosmology with £Im = 0.27 at the present day, with the 
power spectrum normalization given by as = 0.81. 



Figure 4.1 shows a selection of nine of the 4096 shear eigenmodes within this framework. 
The KL modes are reminiscent of 2D Fourier modes, with higher-order modes probing 
progressively smaller length scales. This characteristic length scale of the eigenmodes can 



be seen quantitatively in Figure 4.2 Here we have computed the rotationally averaged 
power spectrum Cg for each individual Fourier mode, and plotted the power vertically as a 
density plot for each mode number. Because the KL modes are not precisely equivalent to 
the 2D Fourier modes, each contains power at a range of values in I. But the overall trend 
is clear: larger modes probe smaller length scales, and the modes are very close to Fourier 
in nature. 



As noted in Section 4.2, one useful quality of a KL decomposition is its diagonalization 



of the signal and noise of the problem. To explore this property, we plot in the upper panel 



of Figure 4.3 the eigenvalue profile of these KL modes. By construction, higher-order modes 



have smaller KL eigenvalues. What is more, because the noise in the covariance matrix is 



whitened (see Section 4.2.2), the expectation of the noise covariance within each mode is 
equal to 1. Subtracting this noise from each eigenvalue gives the expectation value of the 
signal-to-noise ratio: thus we see that the expected signal-to-noise ratio of the eigenmodes 
is above unity only for the first 17 of the 4096 modes. 

At first glance, this may seem to imply that only the first 17 or so modes are useful in 
a reconstruction. On the contrary: as seen in the lower panel of Figure |4.3[ these first 17 
modes contain only a small fraction of the total information in the shear field (This is not an 
unexpected result: cosmic shear measurements have notoriously low signal-to- noise ratios!) 
About 900 modes are needed to preserve an average of 70% of the total signal, and at this 
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Figure 4.2: The normalized power spectrum of each KL mode. For constant mode number, 
the figure represents a histogram of the power in that KL mode, normalized to a constant 
total power. KL modes represent a linear combination of Fourier modes, so that the power 
in each KL mode is spread over a range of I values. Nevertheless, the general trend is clear: 
larger mode numbers are associated with larger wave numbers, and thus smaller length 
scales. 
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level, each additional mode has a signal-to-noise ratio of below 1/10. The noisy input shear 
field can be exactly recovered by using all 4096 modes: in this case, though, the final few 
modes contribute two orders-of-magnitude more noise than signal. 



4-3.2 Testing KL Interpolation 

To test this KL interpolation technique, we use simulated shear catalog^} These catalogs 
contain 220 square degrees of simulated shear maps, computed using a ray-tracing grid 
through a cosmological N-body simulation of the standard A-CDM model. The shear signal 
is computed at the locations of background galaxies with a median redshift of about 0.7. 



Galaxies are incorporated in the simulation using the ADDGALS algorithm ( Wechsler 2004 



Wechsler et al. in preparation), tuned to the expected observational characteristics of the 
DES mission. 

We pixelize this shear field using the same pixel size as above: 64 x 64 pixels per square 
degree. To perform the KL procedure on the full field with this angular resolution would 
lead to a data vector containing over 10 6 elements, and an associated covariance matrix 
containing 10 12 entries. A full eigenvalue decomposition of such a matrix is computationally 
infeasible, so we reconstruct the field in 1° x 1° tiles, each 64 x 64 pixels in size. To reduce 
edge effects between these tiles, we use only the central 0.5° x 0.5° region of each, so that 
covering the 300 square degree field requires 1200 tiles. 

In order to generate a realistic mask over the field area, we follow the procedure outlined 
in Hikage et al. (2010) which generates pixel-level masks characteristic of point-sources, 



saturation spikes, and bad CCD regions. We tune the mask so that 20% of the shear pixels 
have no data. The geometry of the mask over a representative patch of the field can be 



seen in the lower panels of Figure 4.4, where we also show the result of the KL interpolation 
using 900 out of 4096 modes, with a = 0.15 (For a discussion of these parameter choices, 
see Appendix [Cl) . 



The upper panels of Figure 4.4 give a qualitative view of the difficulty of cosmic shear 
measurements. The top left panel shows the noiseless shear across the field, while the top 



3 The simulated shear catalogs were kindly made available to us by R. Wechsler, M Busha, and M. Becker. 
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Figure 4.3: The eigenvalues associated with the eigenmodes discussed in Figure 4.1 By 
construction, the eigenvalue is proportional to the sum of signal and noise within each mode. 
The upper figure shows the value per mode, while the lower figure shows the normalized 
cumulative value. The stepped-pattern evident in the upper panel is due to the presence 
of degenerate eigenmodes which have identical eigenvalues (e.g. modes n = 2 and n = 3, 



related by parity as evident in Figure 4.1). Because the eigenmodes are computed from a 



whitened covariance matrix (see Section 4.2.2), the noise contribution within each mode is 
equal to 1. Subtracting this contribution leads to the plot of signal only: this shows that 
the signal-to-noise ratio is above unity only for the first 17 modes. Still, as seen in the lower 
panel, higher modes are required: the first 17 modes account for only 12% of the signal on 
average. To recover 70% of the signal in a particular reconstruction requires about 1000 
modes. At this point, each additional mode has a signal-to-noise ratio of less than 0.1. Such 
a small signal-to-noise ratio is a well-known aspect of cosmic shear studies. 
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Figure 4.4: This figure illustrates the reconstruction of a small patch of masked shear 
from simulated shear catalog, upper panels: The underlying noiseless shear signal (left), the 
observed, noisy shear signal (middle), and the unmasked reconstruction with 900 modes and 
q = 0.15. The amplitude of the noise is calculated using an intrinsic ellipticity o~ e = 0.3, with 
an average number density of n ga \ = 20 arcmin -2 . The large peak in the upper portion of 
the figure is well-recovered by the KL reconstruction, lower panels: The KL reconstruction 
of the shear in the presence of 20% masking, with increasing number of modes n. The 
mask is represented by the shaded regions in panels: within these regions, the value of the 



shear is recovered through KL interpolation (see Section 4.2.5). We see in this progression 
the effect of the KL cutoff choice: using too few modes leads to loss of information, while 
using too many modes leads to over-fitting within the masked regions (See Appendix [C| for 
a discussion of the choice of number of modes). 
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Figure 4.5: Here we show the same field as in Figure 4.4, reconstructed using n = 900 
modes, with increasing levels of mask coverage. The density of source galaxies has been 
increased to 100 arcmin" 2 , typical of a space-based weak lensing survey. At this noise level, 
smaller halos can be detected within the unmasked KL reconstruction (upperdeft panel). 
Even at a 50% masking level, the large peak at (RA,DEC) = (11.9,36.75) is adequately 
recovered. 
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Figure 4.6: The large-field convergence map derived using the method of Kaiser & Squires 



(1993) from the noiseless input shear (upper panels), the noisy input shear (middle panels), 
and the 900-mode, a = 0.15 reconstruction of the noisy shear, with 20% of pixels masked 
out (lower panels). The rightmost plots of each row cover one square degree. The larger- 
field maps are smoothed with Gaussian filters of width 4 and 2 arcmin, while the smaller 
fields are unsmoothed. These plots show that even with 20% masking of the input signal, 
the KL interpolation procedure recovers the most significant peaks, and offers improvement 
over the results derived from unmasked, observed shear. This improvement will be discussed 
mnre mia.ntita.tivelv helnw. 
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right panel shows the shear with shape noise for a DES-type survey {a e = 0.3, n ga [ = 
20 arcmin~ 2 ). To the eye, the signal seems entirely washed out by the noise. Nevertheless, 
the shear signal is there, and can be fairly well-recovered using the first 900 KL modes 

to 



(middle-left panel). For masked data, we must resort to the techniques of Section 4.2.5 



fill-in the missing data. The middle-right panel shows this reconstruction, with gray shaded 
regions representing the masked area. A visual comparison of the masked and unmasked n = 



900 panels of Figure 4.4 confirms qualitatively that the KL interpolation is performing as 
desired. This is especially apparent near the large cluster located at (RA,DEC)=(11.9,36.7). 



The remaining two lower panels of Figure 4.4 show cases of over-fitting and under-fitting of 
the shear data. If too few KL modes are used, the structure of the input shear field is lost. 
If too many KL modes are used, the masked regions are over-fit, causing the interpolated 
shear values to become unnaturally large. This observation suggests one rubric by which 
the ideal number of modes can be chosen; see the discussion in Appendix [Cj 



It is interesting to explore the limits of this interpolation algorithm. Figure 4.5 shows 
the KL reconstruction with increasing masked fractions, using a noise level typical of space- 
based lensing surveys (n ga i = 100 arcmin~ 2 ) Though the quality of the reconstruction 
understandably degrades, the lower panels show that large features can be recovered even 
with up to 50% of the pixels masked. 

In Figure [476] we provide a comparison of the convergence maps generated from the noise- 
less shear (upper panels) and the KL-reconstructed noisy shear with 20% of pixels masked 
(lower panels). The convergence maps are smoothed by a Gaussian filter to ease compar- 



ison with the quantitative results of Sections 4.4 4.5, where we explore the distribution of 
peaks through an aperture mass filter. The aperture mass filter amounts to a particular 



smoothing function over the convergence field (see Section 4.4.1, below). Comparison of the 



upper and lower panels of Figure 4.6 give a qualitative indication of the performance of KL: 
high-convergence regions are recovered remarkably well, while convergence peaks of lower 
magnitude are obscured by the background noise: as we show in the following sections, this 
obscuration is largely the result of shape noise in the simulated shear measurements. 

For a quantitative analysis of the effectiveness of the KL interpolation in convergence 
mapping, and the potential biases it introduces, a large-scale statistical measure is most 
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appropriate. In the following sections, we test the utility of this KL interpolation scheme 
within the framework of shear peak statistics. 

4.4 Shear Peak Statistics 

It has long been recognized that much useful cosmological information can be deduced 



from the masses and spatial distribution of galaxy clusters (e.g. Press & Schechter, 1974). 
Galaxy clusters are the largest gravitationally bound objects in the universe, and as such 



are exponentially sensitive to cosmological parameters (White et al. 1993). The spatial 
distribution of clusters and redshift evolution of their abundance and clustering is sensitive to 
both geometrical effects of cosmology, as well as growth of structure. Because of this, cluster 
catalogs can be used to derive constraints on many interesting cosmological quantities, 



including the matter density Qm and power spectrum normalization ag (Lin et al. 2003), 



the density and possible evolution of dark energy (Linder & Jenkins, 2003 Vikhlinin et al. 



2009), primordial non-gaussianities (Matarrese et al. 2000 Grossi et al. 2007), and the 



baryon mass fraction (Lin et al. 2003 Giodini et al. 2009) 



Various methods have been developed to measure the mass and spatial distribution of 
galaxy clusters, and each are subject to their own difficult astrophysical and observational 
biases. They fall into four broad categories: optical or infrared richness, X-ray luminosity 
and surface brightness, Sunyaev-Zeldovich decrement, and weak lensing shear. 

While it was long thought that weak gravitational lensing studies would lead to robust, 
purely mass-selected cluster surveys, it has since become clear that shape noise and projec- 
tion effects limit the usefulness of weak lensing in determining the 3D cluster mass function 



Hamana et al. 


2004, 


Hennawi & Spergel 


2005 Mandelbaum et al. 


2010 


VanderPlas et al. 



2011). The shear observed in weak lensing is non-locally related to the convergence, a mea- 
sure of projected mass along the line of sight. The difficulty in deconvolving the correlated 
and uncorrelated projections in this quantity leads to difficulties in relating these projected 
peak heights to the masses of the underlying clusters in three dimensions. Recent work 
has shown, however, that this difficulty in relating the observed quantity to theory may be 
overcome through the use of statistics of the projected density itself. 



Marian et al. (2009, 2010) first explored the extent to which 2D projections of the 3D 
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mass field trace cosmology. They found, rather surprisingly, that the statistics of the pro- 
jected peaks closely trace the statistics of the 3D peak distribution: in N-body simulations, 



both scale with the Sheth & Tormen (1999) analytic scaling relations. The same corre- 
lated projections which bias cluster mass estimates contribute to a usable signal: statistics 
of projected mass alone can provide useful cosmological constraints, without the need for 
bias-prone conversions from peak height to cluster mass. 

A host of other work has explored diverse aspects of these shear peak statistics, including 



tests of these methods with ensembles of N-body simulations (Wang et al. , 2009 Kratochvil 



et al. 2010 Dietrich & Hartlapi 2010), the performance of various filtering functions and 



peak detection statistics ( jPires et"ak 2009 Schmidt & Rozo, 2010; Kratochvil et al. 2011), 



exploration of the spatial correlation of noise with signal within convergence maps (Fan 



et al. 2010), and exploration of shear-peak constraints on primordial non-gaussianity (Ma- 



turi et al. 2011). The literature has yet to converge on the ideal mapping procedure: con- 



vergence maps, Gaussian filters, various matched filters, wavelet transforms, and more novel 
filters are explored within the above references. There is also variation in how a "peak" 
is defined: simple local maxima, "up-crossing" criteria, fractional areas above a certain 
threshold, connected-component labeling, hierarchical methods, and Minkowski functionals 
are all shown to be useful. Despite diverse methodologies, all the above work confirms that 
there is useful cosmological information within the projected peak distribution of cosmic 
shear fields, and that this information adds to that obtained from 2-point statistics alone. 



4-4-1 Aperture Mass Peaks 



Based on this consensus, we use shear peak statistics to explore the possible bias induced 
by the KL interpolation method outlined above. We follow the aperture mass methodology 



of Dietrich & Hartlap (2010): The aperture mass magnitude at a point 0q is given by 



M ap (0o) 



d 2 0Q NF w(tf = |0-0o \)jt(O;0 o ) 



(4.18) 
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where 74 9q) is the component of the shear at location 9 tangential to the line — Oq, and 



Qnfw is the NFW-matched filter function defined in Schirmer et al. (2007): 

Qksfw{x;x c ) oc 



1 



1 + e 



6-150x 



+ e 



tanh(x/x c 
-47+5(te x/x c 



(4.19) 



with x = i5/i? mffl and x c a free parameter. We follow Dietrich Sz Hartlap (2010) and set 
$max = 5.6 arcmin and x c 



0.15. The integral in Equation 4.18 is over the whole sky, 
though the filter function Q effectively cuts this off at a radius $ m ax- In the case of our 
pixelized shear field, the integral is converted to a discrete sum over all pixels, with i9 equal 
to the distance between the pixel centers: 



nfw {$ij)it(0i; Oj 



(4.20) 



where we have defined 



We can similarly compute the B-mode aperture mass, by substituting 7$ — > 7 X in Equa- 
tions 



4.18 4.20 (Crittenden et al. 2002). For pure gravitational weak lensing with an unbi- 



ased shear estimator, the B-mode signal is expected to be negligible, though second-order 
effects such as source clustering and intrinsic alignments can cause contamination on small 



angular scales (Crittenden et al. 2002 Schneider et al. 2002b) These effects aside, the B 



mode signal can be used as a rough estimate of the systematic bias of a particular analysis 
method. 

For our study, the aperture mass is calculated with the same resolution as the shear 
pixelization: 64 2 pixels per square degree. A pixel is defined to be a peak if its value is 
larger than that of the surrounding eight pixels: a simple local maximum criterion. 

4-4-2 The Effects of Masking 

When a shear peak statistic is computed across a field with masked regions, the masking 



leads to a bias in the peak height distribution (see Section 4.5 below). Moreover, due 
to the non-local form of the aperture mass statistic, a very large region is affected: in 
our case, a single masked pixel biases the aperture mass measurement of an area of size 
7r(2$ max ) 2 « 400 arcmin 2 . There are two naive approaches one could use when measuring 
the aperture mass in this situation: 
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Figure 4.7: Comparison of the masked and unmasked peak distributions, left panel: the 
peak distributions without the use of KL. The black line is the result with no masking, 
while the red and green lines show the two naive methods of correcting for the mask (see 



Section 4.4.2). right panel: the masked and unmasked peak distributions after applying 
KL. Neither naive method of mask-correction adequately recovers the underlying peak dis- 
tribution. It is evident, however, that the KL-based interpolation procedure recovers a mass 
map with a similar peak distribution to the unmasked KL map. It should be noted that the 
unmasked peak distribution (black line, left panel) is not identical to the unmasked peak 
distribution after application of KL (black line, right panel). This difference is addressed in 
Figure |4.8 
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Unweighted: Here we simply set the shear value within each masked pixel to zero, and 
apply Equation 4.20| The shear within the masked regions do not contribute to the 



peaks, so the height of the peaks will be underestimated. 
Weighted: Here we implement a weighting scheme which re-normalizes the filter Qnfw to 



is 



reflect the reduced contribution from masked pixels. The integral in equation 4.18 
replaced by the normalized sum: 

M ap (£) = ^nfw(^)7^>(^) (4 21) 

where w(6j) = if the pixel is masked, and 1 otherwise. This should correct for the 
underestimation of peak heights seen in the unweighted case. 

In order to facilitate comparison between this weighted definition of M ap and the nor- 
mal definition used in the unmasked and unweighted cases, we normalize the latter by 
Y2j QNFwfAj); which is a constant normalization across the field. 

Note that in both cases, it is the shear that is masked, not the M ap peaks. Aperture 
mass is a non-local measure, so that the value can be recovered even within the masked 
region. This means that masking will have a greater effect on the observed magnitude of the 
peaks than it will have on the count. In particular, on the small end of the peak distribution, 
where the peaks are dominated by shape noise, the masking of the shear signal is likely to 



have little effect on the distribution of peak counts. This can be seen in Figure 4.8 



4-4-3 M ap Signal-to-Noise 

It is common in shear peak studies to study signal-to-noise peaks rather than directly study 



aperture-mass or convergence peaks (e.g. Wang et al. 2009 Dietrich & Hartlap, 2010 



Schmidt Sz Rozo, 2010). We follow this precedent here. The aperture mass (Eqn. 4.20) is 
defined in terms of the tangential shear. Because we assume that the shear measurement 
is dominated by isotropic, uncorrelated shape noise, the noise covariance of M ap can be 
expressed 

WM]ij = (Map$)A*ap$)> 
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9 ^NFW^ifc)^ jlkkSi 



(4.22) 



where we have used the fact that shape noise is uncorrelated: [A/" 7 ]ij = (niUj) oc Sy. 

In the case of a KL-reconstruction of a masked shear field, the reconstructed shear has 



non-negligible correlation of noise between pixels. From Equations 4.15 4.17, it can be 
shown that 



w 2 



(4.23) 



The covariance matrix Af^ is no longer diagonal, but the noise remains isotropic under the 
linear transformation, so that Af^ t = The aperture mass noise covariance can thus 

be calculated in a similar way to the non-KL case: 



(4.24) 



k e 



This expression can be computed through standard linear algebraic techniques. The aper- 
ture mass signal-to-noise in each pixel is given by 



[S/Nji = M ap (9i)/y/W. 



M\ii 



(4.25) 



4.5 Discussion 



4-5.1 M ap Peak Distributions 



In Figures 4.7 4.10 we compare the peak distribution obtained with and without KL. We 



make three broad qualitative observations which point to the efficacy of KL in interpolation 
of masked shear fields, and in the filtration of shape-noise from these fields. We stress the 
qualitative nature of these results: quantifying these observations in a statistically rigorous 
way would require shear fields from an ensemble of cosmology simulations, which is beyond 
the scope of this work. These results nevertheless point to the efficacy of KL analysis in 
this context. 



KL filtration corrects for the bias due to masking. Figure 4.7 compares the effect 
of masking on the resulting peak distributions with and without KL. The left panel shows 
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Unmasked Peak Distributions 



noisy (no KL) 
noisy (with KL) 
noiseless 




0.035 0.040 



Figure 4.8: Comparison of the distribution of M ap peaks for unmasked shear, before 
and after filtering the field with KL (dotted line and dashed line, respectively). The peak 
distribution in the absence of noise is shown for comparison (solid line). It is clear that 
the addition of shape noise leads to many spurious M ap peaks: noise peaks outnumber true 
peaks by nearly a factor of 10 for smaller peak heights. Filtering by KL reduces these 
spurious peaks by about a factor of 3, and for larger peaks leads to a distribution similar 
in scale to that of the noiseless peaks. 
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Figure 4.9: The comparison between B-mode peak distributions and the peak distributions 
for a shear field composed entirely of noise. As expected, the B-mode peak distributions are 
largely consistent with being due to noise only. Because of this, we can use B-mode peaks 
as a rough proxy for the noise. 
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Figure 4.10: top panel: The cumulative distributions of peaks in signal-to-noise, for peaks 



with M ap /o"A/ > 3.25. This is the statistic used by Dietrich &; Hartlap (2010) to discrim- 
inate between cosmological models, bottom panel: The ratio of B-mode to E-mode peak 
distributions. Filtration by KL reduces the relative number of B-mode peaks by about 1/3. 



Because B-mode peaks are a proxy for contamination by shape noise (see Figure 4.9), this 
indicates that KL-filtration results in peak distributions less affected by statistical errors. 
KL also reduces the total number of both E and B peaks by about 2/3; this effect can also 



be seen in Figure 4.8 
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the unmasked noisy peak distribution, and the masked peak distributions resulting from 
the weighted and unweighted approaches described in Section |4.4.2 Neither method of 
accounting for the masking accurately recovers the unmasked distribution of peak heights. 
The unweighted approach (green line) leads to an underestimation of peak heights. This 
is to be expected, because it does not correct for the missing information in the masked 
pixels. The weighted approach, on the other hand, over-estimates the counts of the peaks. 
We suspect this is due to an analog of Eddington bias: the lower signal-to-noise ratio of the 
weighted peak statistic leads to a larger scatter in peak heights. Because of the steep slope 
of the peak distribution, this scatter preferentially increases the counts of larger peaks. This 
suspicion is confirmed by artificially increasing the noise in the unmasked peak function. 
Increasing a e from 0.30 to 0.35 in the unmasked case results in a nearly identical peak 
function to the weighted masked case. 



The right panel of Figure 4.7 shows that when KL is applied to the shear field, the 
distribution of the masked and unmasked peaks is very close, both for E-mode and B-mode 
peaks. This indicates the success of the KL-based interpolation outlined in Section |4.2.5 



Even with 20% of the pixels masked, the procedure can recover a nearly identical peak 
distribution as from unmasked shear. 

KL filtration reduces the number of noise peaks. Comparison of the unmasked 



lines in the left and right panels of Figure 4.7 shows that application of KL to a shear 
field results in fewer peaks at all heights. This is to be expected: when a reconstruction 
is performed with fewer than the total number of KL modes, information of high spatial 
frequency is lost. In this way, KL acts as a sort of low-pass filter tuned to the particular 



signal-to- noise characteristics of the data. Figure 4.8 over-plots the KL and non-KL peak 
distributions with the noiseless peak distribution. From this figure we see that the inclusion 
of shape noise results in nearly an order-of-magnitude more peaks than the noiseless case. 
The effect of noise on peak counts lessens slightly for higher- M ap peaks: this supports the 



decision of Dietrich & Hartlap (2010) to limit their distributions to peaks with a signal- 
to-noise ratio greater than 3.25: the vast majority of peaks are lower magnitude, and are 
overwhelmed by the effect of shape noise. 

Omission of higher-order KL modes of shear field reduces the number of these spurious 
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peaks by a factor of 3 or more. For low-magnitude peaks, (M ap <~ 0.02), KL still produces 
peak counts which are dominated by noise. For higher-magnitude peaks, the number of 
observed KL peaks more closely approaches the number of peaks in the noiseless case. 

KL filtration reduces the presence of B-modes. To first order, weak lensing shear 
is expected to consist primarily of curl-free, E-mode signal. Because of this, the presence 
of B-modes can indicate a systematic effect. It is not obvious that filtration by KL will 



maintain this property: as noted in Section 4.2.4 KL modes individually are agnostic to E- 
mode and B-mode information. E&B information is only recovered within a complex-valued 
linear combination of the set KL modes. 



Figure 4.9 shows a comparison between the unmasked B-mode peak functions from 



Figure 4.7 and the associated E-mode peak functions due to shape- noise only. For both 
the non-KL version and the KL version, the B-mode peak distributions closely follow the 
distributions of noise peaks. This supports the use of B-mode peaks as a proxy for the 
peaks due to shape noise, even when truncating higher-order KL modes. 



The near-equivalence of B-modes and noise-only peaks shown in Figure 4.9 suggests a 
way of recovering the true peak function, by subtracting the B-mode count from the E-mode 
count as a proxy for the shape noise. This approach has one fatal flaw: because it involves 
computing the small difference between two large quantities, the result has extremely large 
uncertainties. It should be noted that this noise contamination of small peaks is not an 
impediment to using this method for cosmological analyses: the primary information in 
shear peak statistics is due to the high signal-to-noise peaks. 

In the top panel of Figure [4. 10 , we show the cumulative distribution of peaks in signal- 
to-noise, for peaks with S/N > 3.25: the quantity used as a cosmological discriminant in 



Dietrich Sz Hartlap (2010). The difference in the total number of E-mode peaks in the KL 



and non-KL approaches echoes the result seen in Figure 4.8 truncation of higher-order 
KL modes acts as a low-pass filter, reducing the total number of peaks by a factor of ~ 3. 



More interesting is the result shown in the lower panel of Figure 4.10, where the ratio of 
B-mode peak counts to E-mode peak counts is shown. Before application of KL, the B-mode 
contamination is above 30%. Filtration by KL reduces this contamination by a factor of 
~ 3, to about 10%. This indicates that the truncation of higher-order KL modes leads to 
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a preferential reduction of the B-mode signal, which traces the noise. This is a promising 
observation: the counts of high signal-to-noise peaks, which offer the most sensitivity to 



cosmological parameters (Dietrich & Hartlap, 2010), are significantly less contaminated by 
noise after filtering and reconstruction with KL. This is a strong indication that the use of 
KL could improve the cosmological constraints derived from studies of shear peak statistics. 



Note that in Figure 4.10 we omit the masked results for clarity. The masked cumulative 
signal-to-noise peak functions have B/E ratios comparable to the unmasked versions, so the 
conclusions here hold in both the masked and unmasked cases. 



4-5.2 Remaining Questions 

The above discussion suggests that KL analysis of masked shear fields holds promise in 
constraining cosmological parameters of shear peaks in both masked and unmasked fields. 
KL greatly reduces the number of spurious noise peaks at all signal-to-noise levels. It 
minimizes the bias between masked and unmasked constructions, and leads to a factor of 
3 suppression of the B-mode signal, which is a proxy for the spurious signal introduced 
through shape noise. 

The question remains, however, how much cosmological information is contained in the 
KL peak functions. The reduction in level of noise peaks is promising, but the omission 
of higher-order modes in the KL reconstruction leads to a smoothing of the shear field on 
scales smaller than the cutoff mode. This smoothing could lead to the loss of cosmologically 
useful information. In this way, the choice of KL mode cutoff can be thought of as a 
balance between statistical and systematic error. The effect of these competing properties 
on cosmological parameter determination is difficult to estimate. Quantifying this effect 
will require analysis within a suite of synthetic shear maps, similar to the approach taken 



in previous studies (e.g. Dietrich & Hartlap, 2010 Kratochvil et al. 2010), and will be the 
subject of future work. 

Another possible application of KL in weak lensing is to use KL to directly constrain 
2-point information in the measured shear data. In contrast to the method outlined in 
the current work, KL basis functions can be computed for the unmasked region only. The 
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projection of observed data onto this basis can be used to directly compute cosmological 
parameters via the 2-point function, without ever explicitly calculating the power spectrum. 



This is similar to the approach taken for galaxy counts in Vogeley & Szalay (1996). This 
approach is the subject of Chapter 5. 
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Chapter 5 

APPLICATION TO COSMOS LENSING DATA 

This chapter will cover the application of KL parameter estimation to lensing data 
from the COSMOS survey, a ~ 1.5 square degree field observed by the Hubble Space 
Telescope. We use KL to express the data within the optimal orthonormal basis dictated 



by the survey geometry, and use the Bayesian inference framework developed in {2 A to 
perform a simple parameter estimation using the KL basis in place of the usual Fourier 
basis. From this, we obtain parameter constraints on Qm and o"s which are similar to those 
from conventional angular correlation analyses, with a framework that is free from the 
systematic errors associated with incomplete sky coverage and irregular survey geometry. 

This chapter represents a first exploration of this problem; the results consider a simple 
two dimensional analysis for two cosmological parameters. KL can naturally be extended 
to 3D tomographic approaches with any number of parameters; this will be the subject of 
future work. 

5.1 Introduction 

In this chapter we explore the evaluation of cosmological likelihoods using KL analysis of 
shear fields. In Chapter 4 we explored the use of KL analysis in shear surveys, focusing on 
the ability of KL modes to help fill-in missing information within the context of weak lensing 
convergence mapping and studies of the peak statistics of the resulting mass maps. Here 
we follow a different approach: we use KL analysis to aid in the calculation of cosmological 
likelihoods using two-point statistics within a Bayesian framework. This draws upon similar 
work done previously to constrain cosmological parameters using number counts of galaxy 



surveys (Vogeley & Szalay, 1996, Pope et al. 2004). 



In §5.2|we review and discuss the strengths and weaknesses of constraining cosmological 



quantities using two-point shear statistics. In { 5.3 we review KL analysis and its application 
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to shear surveys. In £5.4 we describe the COSMOS shear data used in this analysis, and we 



discuss these results in £5.5 



5.2 Two-point Statistics in Weak Lensing 



As noted and outlined in Chapter 1, the large-scale structure of the Universe provides a 
powerful probe of cosmological parameters. Through gravitational instability, the initial 
matter fluctuations have grown to the nonlinear structure we see today. This happens in 
a hierarchical manner, with the smallest structures collapsing before the largest. One of 
the most powerful probes of this structure is the redshift-dependent power spectrum of 
matter density fluctuations, Pk(z), which gives the amplitude of the Fourier mode with 
wave- number A; at a redshift z. This approach has often been used to measure cosmological 



parameters through optical tracers of the underlying dark matter structure (e.g. Tegmark 



et al. , 2006). In this chapter we explore the use of weak lensing measurements of the matter 



power spectrum. Recent work has shown the power of this lensing-based approach (Ichiki 



et al. 2009; Schrabback et al., 2010). 



The are two approaches to measuring two-point information are mathematically equiv- 
alent: the power spectrum V(£), and its Fourier transform £(0). In practice, the most 
commonly used method of measuring two-point information is through correlation func- 
tions (see 



Schneider et al. , 2002a). The main advantage of correlation functions is their ease 



of measurement: they can be straightforwardly estimated from the positions and shapes of 
galaxies, even in very complicated survey geometries. Their disadvantage is that the signal is 
highly correlated between different scales. Accounting for this correlation is very important 
when computing cosmological likelihoods, and often requires large suites of simulations. 

Shear power spectra, on the other hand, have a number of nice properties. Compared 
to correlation functions, they provide a simpler mapping to theoretical expectations. They 
have weaker correlations between different multipoles: on the largest scales, where structure 
is close to Gaussian, the scales are expected to be statistically independent. Even on small 
scales where non-Gaussianity leads to correlated errors, these correlations have a relatively 



small effect on derived cosmological constraints (Takada & Jain, 2009). The disadvantage of 



shear power spectra as direct cosmological probes is the difficulty of measuring them from 
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data. In particular, survey geometry effects such as incomplete sky coverage and masking 
can lead to mixing of power on all angular scales. This mode-mixing is a direct result of the 
loss of orthogonality: spherical harmonics are orthogonal over the entire sky, but are not 
necessarily orthogonal over the incomplete patch of the sky represented by lensing surveys. 
Even in the case of future all-sky surveys, the masking from foreground sources will pose a 
problem. This means that the spherical harmonic decomposition on which power spectra 
are based is not unique for realistic surveys. It may be possible to construct a survey in 



order to limit the magnitude of these effects (see Kilbinger & Schneider 2004, Kilbinger 



&; Munshi 2006, for some approaches). There have also been a few attempts to correct 



for this difficulty through direct deconvolution of the survey geometry from the correlation 



signal (Brown et al. 2003; Hikage et al. 2011), but because of the computational difficulty 



involved with these methods, results based on correlation function measures remain more 
common. Here we explore an alternate approach which relies on constructing a new set of 
orthogonal modes for the observed survey geometry. Because the new modes are orthogonal 
by construction, one can avoid the difficulties associated with mode mixing. We propose to 
take this latter approach using Karhunen-Loeve (KL) analysis. 

5.3 KL for Parameter Estimation 



As discussed more fully in Chapter 2, KL analysis and the related Principal Component 
Analysis are well-known statistical tools which have been applied in a wide variety of as- 



trophysical situations, from e.g. analysis of the spatial power of galaxy counts (Vogeley & 



Szalay, 1996; Szalay et al. 2003 Pope et al. 2004) to characterization of stellar, galaxy. 



and QSO spectra ( |Connolly et~al} |1995[ |Connolly Szalay[ |1999[ |Yip et al.[ |2004a|b| ), to 



studies of noise properties of weak lensing surveys ( Kilbinger & Munshi 2006 Munshi & 



Kilbinger 2006), and a host of other situations too numerous to mention here. Informally, 



the power of KL/PCA rests in the fact that it allows a highly efficient representation of a 
set of data, highlighting the components that are most important in the dataset as a whole. 
Though the framework is discussed more completely in Chapter 2, we will review the most 
important points here. The discussion of KL analysis below derives largely from Vogeley & 



Szalay (1996), reexpressed for application in cosmic shear surveys. 
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Any D-dimensional data point may be completely represented as a linear combination of 
D orthogonal basis functions: this is a geometrical property, closely linked to the free choice 
of coordinate axes used to represent points in a D-dimensional space. For example, the data 
may be N individual galaxy spectra, each with flux measurements in D wavelength bins. 
Each spectrum can be thought of as a single point in D-dimensional parameter space, where 
each axis corresponds to the value within a single wavelength bin. Geometrically, there is 
nothing special about this choice of axes: one could just as easily rotate and translate the 
axes to obtain a different but equivalent representation of the same data. 

In the case of of a shear survey, our single data vector is the set of cosmic shear mea- 
surements across the sky. We will divide the sky into N cells in angular and redshift 
space, at coordinates Xi = (0 x ,i,0y,i,Zi) These cells may be spatially distinct, or they 
may overlap. From the ellipticity of the galaxies within each cell, we estimate the shear 
7i = 7°(xi) = j{xi) + n 7 (a3j) where j(xi) is the true underlying shear, and n 7 (a3j) is the 
measurement noise. Our data vector is then 7 = [71, 72 • • • 7tv] t - 

We seek to express our set of measurements 7 as a linear combination of N (possibly 
complex) orthonormal basis vectors {S!fj(xi,j = 1,N)} with complex coefficients ay. 

N 

For conciseness, we'll create the matrix * whose columns are the basis vectors StFj, so that 
the above equation can be compactly written 7 = *a. Orthonormality of the basis vectors 
leads to the property VE^* = I, where i" is the identity matrix: that is, * is a unitary matrix 
with i Sf~ 1 = . Observing this, we can easily compute the coefficients for a particular data 
vector: 

a = * f 7- (5-2) 

We will be testing the likelihood of a particular set of coefficients a. The statistical properties 
of these coefficients can be written in terms of the covariance of the observed shear: 

(aa^ = * f ^77^ * = (5.3) 

where we have defined the observed shear correlation matrix £ = (77'), and angled braces 
(• • •) denote expectation value or ensemble average of a quantity. 
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In order to perform a likelihood analysis on the coefficients a, we will require that a be 
statistically orthogonal: 



aa 



! >^' (5-4) 



Comparing Equations 5.3 & 5.4 we see that the desired basis functions are the solution of 
the eigenvalue problem 

£*j = A,*, (5-5) 

where the eigenvalue \j = (of). Comparison of this to the KL framework outlined in Chap- 
ter 2 shows that the unique basis with these properties is given by the KL decomposition 
of the shear field 7, represented by the correlation matrix of observations By conven- 
tion, we'll again order the eigenvalue/eigenvector pairs such that A« > Aj+iVi G (1, N — 1). 
Expansion of the data 7 into this basis is the discrete form of KL analysis. 

In chapter 2 we discussed the Uniqueness, Efficiency, and Signal-to-noise optimality of 
KL modes. In particular, we showed that if signal and noise are uncorrelated, then the 
covariance of the observed shear can be decomposed as 

£ = S + Af (5.6) 

where S is the covariance of the signal, and Af is the covariance of the noise. Because the 
noise covariance Af = (n 7 n 7 t) is proportional to the identity by assumption, diagonaliza- 
tion of £ results in a simultaneous diagonalization of both the signal S and the noise Af. 
Based on this signal-to-noise optimization property, KL modes can be proven to be the 



optimal basis for testing of spatial correlations (see Appendix A of Vogeley & Szalay, 1996) 



5.3.1 Shear Noise Properties 

The signal-to-noise properties of shear mentioned above are based on the requirement that 
noise be "white", that is, the noise covariance is Af = (n 7 n 7 ^) = a 2 1. Noise in measured 
shear is affected mainly by the intrinsic ellipticity and source density, but can also be prone to 
systematic effects that lead to noise correlations between pixels. When the survey geometry 
leads to shear with more complicated noise characteristics, a whitening transformation can 
be applied. 
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Given the measured data 7 and noise covariance Af, we can define the whitened shear 

y = AT x / 2 7 (5.7) 
With this definition, the shear covariance matrix becomes 



= M- l ' 2 [S + M]M- 1 ' 2 

= Af~ 1/2 SAf~ 1/2 + I (5.8) 

We see that the whitened signal is S' = Af^^SAf -1 ^ 2 and the whitened noise is Af' = I, 
the identity matrix. This transformation whitens the data covariance, so that the noise in 
each bin is constant and uncorrelated. Given the whitened measurement covariance we 
can find the KL decomposition that satisfies the eigenvalue problem 

= A^', (5.9) 

With KL coefficients given by 

a' = V'W- 1/2 j (5.10) 
Note that because (7) = 0, the expectation value of the KL coefficients is 

(a') = Af-^Cy) 

= (5.11) 

For the remainder of this chapter, it will be assumed that we are working with whitened 
quantities. The primes will be dropped for notational simplicity. 

5.3.2 Constructing the Covariance Matrix 

In many applications, the data covariance matrix can be estimated empirically, using the 
fact that 



i=i 
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Unfortunately, in surveys of cosmic shear, we have only a single sky to observe, so this 
approach does not work. Instead, we can construct the measurement covariance analytically 
by assuming a theoretical form of the underlying matter power spectrum. 

The measurement covariance between two regions of the sky A^ and Aj is given by 



Sij + A/" ij 

/ d 2 Xi / d 2 Xji + (\xi - Xj 



+ Afi 



(5.13) 



where £+(0) is the "+" shear correlation function. £+(0) is expressible as an integral over 



the shear power spectrum weighted by the zeroth-order Bessel function (see, e.g. Schneider 



et al. 2002a): 



i r°° 

2Wo 



dt IP^(1)J {&) 



(5.14) 



The angular shear power spectrum P 7 (£) can be expressed as a weighted line-of-sight integral 
over the matter power 

"Xs 



dxW 2 (x)x- 2 Ps(k=-;z(x) 



(5.15) 



Here x 1S the comoving distance, Xs is the distance to the source, and W(x) is the lensing 
weight function, 

(5.16) 



2o(x) "3 i x 

where n(z) is the empirical redshift distribution of galaxies. The nonlinear mass fluctuation 
power spectrum Pg(k, z) can be predicted semi-analytically: in this work we use the halo 



model of Smith et al. (2003). With this as an input, we can analytically construct the 



measurement covariance matrix £ using Equations 5.13 5.16 



5.3.3 Cosmological Likelihood Analysis with KL 

The cosmological analysis with KL consists of the following steps: from the survey geometry 



and galaxy ellipticities, we measure the shear 7, estimate the noise covariance J\ (see \ 5.4.1 ) 



and derive the whitened covariance matrix From £ we compute the KL basis \l/ and A. 
Using the KL basis, we compute the coefficients a = ^ J\f~ l l 2 ~f . Given these KL coefficients 
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a, we use a Bayesian framework to compute the posterior distribution of our cosmological 
parameters. 



The problem of Bayesian inference with KL was discussed in {2 A Here we will briefly 
outline the portions relevant to this chapter. Given observations D and prior information /, 
Bayes' theorem specifies the posterior probability of a model described by the parameters 

PM}\DI) = PWm ^jffp (5-17) 
The term on the left of the equality is the posterior probability of the set of model parameters 
{0i}, which is the quantity we are interested in. The likelihood function for the observed 
coefficients a enters into the numerator P(D\{0i}I). The denominator P(D\I) is essentially 
a normalization constant, set so that the total probability over the parameter space equals 
unity. 

For a given model {9i}, we can predict the expected distribution of model KL coefficients 

C {6i} = ( a m a \e,}) 

Using this, the measure of departure from the model m is given by the quadratic form 

X 2 = a)C- { l } a (5.19) 

The likelihood is then given by 

£(a|{0i}) = (2^)"/ 2 |det(q eJ )r 1/2 exp(- X 2 /2) (5.20) 
where n is the number of degrees of freedom: that is, the number of eigenmodes included 



in the analysis. The likelihood given by Equation 5.20 enters into Equation 5.17 when 
computing the posterior probability. 

5.4 COSMOS data 

To test the KL likelihood formalism, we use a shear catalog derived from the COSMOS 
survejQ A full description of this catalog and detailed tests of its systematics are presented 



1 We are grateful to Tim Schrabback et al. for making these data available to us 
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Schrabback et al. (2010 hereafter S10); we will summarize some relevant details here. 



The catalog contains shape measurements of 446,934 source galaxies in a 1.64 square-degree 
field. The "bright" sample of 194,976 galaxies are those with well-behaved photometric 



redshifts drawn from the COSMOS30 pipeline ( |Hildebrandt et al.[ [20091 S10 )> The angular 
distribution of these galaxies is shown in the upper panel of Figure |5.1[ and the redshift 



distribution is shown in the upper panel of Figure 5.2. The redshift distribution is marked 
by spikes indicating the presence of clusters of galaxies at the same redshift. 

The remaining 446,909 galaxies are too faint to have been included in the reference 



catalog (the COSMOS30 redshifts are limited to i + < 25; See |Ilbert etaL] |2009| ), and S10 
estimates their redshift distribution using the empirical relationship between redshift and 
absolute z-band magnitude. The spatial and redshift distributions of this "faint" galaxy 



sample are shown in the lower panels of Figures 5.1 and 5.2, respectively. 

In addition, S10 identifies potential catastrophic outliers in the redshifts. Photometric 
redshifts gain significant leverage from broad spectral features such as the Lyman and 
Balmer spectral breaks. The Balmer limit is ~ 364nm, while the Lyman limit is ~ 91nm, so 
that the Balmer limit of a galaxy at redshift zq is at the same observed wavelength as the 
Lyman limit at redshift zq + 3. This can lead to a degeneracy in redshift determination that 
results in catastrophic outliers - that is, low redshift galaxies identified as high redshift, 
or high redshift galaxies identified as low redshift. In shear studies, the former acts to 
dilute the high-z shear signal, while the latter acts to add spurious signal at low redshift. 
To prevent the latter effect from affecting results, we follow S10 and remove galaxies with 
i + > 24 and redshifts z < 0.6. S10 provides several tests which show that this cut does not 
generate appreciable systematic error. 

S10 performs a classical shear correlation function analysis to find constraints on as 
and Ojvf that are consistent with those derived from WMAP: for a flat ACDM cosmol- 
ogy, they find with 63% confidence o- 8 (f2 A //0.3) a51 = 0.79 ± 0.09. S10 performs both a 
two-dimensional analysis and a three dimensional analysis: the constraints from each are 
consistent, with a slightly better figure of merit for the £Im vs. cs constraint when the 
analysis is computed within several redshift bins. The strength of the 3D treatment comes 
when we drop assumptions about flatness or the dark energy equation of state: for a ACDM 
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Bright Objects (N = 194976) 




(deg) 



Faint Objects (N = 251958) 




RA (deg) 

Figure 5.1: Angular locations of the 194,976 COSMOS galaxies with photometric redshift 
measurements (top panel) and the 251,958 COSMOS galaxies without photometric redshift 
estimates (bottom panel). The masking due to bright foreground sources is evident in both 
panels. 
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Bright Catalog Redshifts 




184188 galaxies 



Distribution Fit: 

p(z)=z a exp[-(z/z ) b ] 
z =0.65 
a = 1.57 
b=1.31 



Faint + Bright galaxies 
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.446909 galaxies 



Distribution Fit: 

p(z)=z a exp[-(z/z ) b ] 
z =0.64 
a = 1.15 
b=1.02 




Figure 5.2: Redshift distributions of the COSMOS data. The top panel shows the distri- 
bution of photometric redshifts of the shear catalog cross-matched with the COSMOS30 



photometric redshift catalog (Ilbert et al. 2009), while the bottom panel includes the in- 
ferred redshift distribution of the remaining faint galaxies. 
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cosmology with varying dark energy equation of state w, S10 finds at 90% confidence that 
w < —0.41. Though these constraints offer only a slight improvement over prior informa- 
tion from WMAP constraints from measurements of the CMB, we must note that they are 
derived from just over 1 square degree of observations, while the WMAP constraints use 
the full sky. Future wide-field lensing surveys will be able to place much more competitive 
constraints on these parameters. 

Here we will not duplicate all the various analyses of S10: instead we will use the KL- 
based estimation formalism of §2.4| with shear eigenbases computed for the observed field 



via the formalism of {2.5 This will enable us to constrain two-point information using the 
KL formalism. 



5.4-1 Intrinsic Ellipticity estimation 

In order to apply the KL analysis techniques discussed above and in Chapter 2, we require 
an accurate determination of the noise for the observed shear. Assuming systematic errors 
are negligible, shape noise should be dominated by shot noise, which scales as Afu = a^/rii, 
with rii representing the number of galaxies in bin i. 

To test this assumption, we perform a bootstrap resampling of the observed shear in 
square pixels that are two arcminutes on a side. Generating 1000 bootstrap samples within 
each pixel, we compute the variance in each pixel. From Poisson statistics, we would expect 
the variance in each pixel to scale inversely with the number of measurements within each 
pixel: with this in mind we plot in Figure |5.3| the variance vs the number of galaxies and 
fit a curve of the form 

2 

ol = (5.21) 

"gal 

where Gi n t is the intrinsic ellipticity dispersion of the population. As shown in the upper 
panel of Figure [573] , the best-fit curve has Ui n t = 0.393. The residual distribution, shown in 
the lower panel of the figure, is close to Gaussian as expected. 

In this figure, we see that the pixel-to-pixel fluctuation in shape noise is only a few 
percent. For the analysis below, we use for each pixel the noise estimates derived the 
bootstrap resampling within each pixel. Because bootstrapping is inaccurate for pixels with 
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Results for 1000 bootstrap resamples (2' pixels) 




300 
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Figure 5.3: Bootstrap estimates of the shape noise for each pixel. The estimates reflect an 
intrinsic ellipticity of 0.393 ± 0.013. 



a small number of galaxies, if a pixel has fewer than 10 sources we use the best-fit estimate 
for the noise, Afu = crl/ni with a e = 0.392. Pixels with zero galaxies (i.e. masked pixels) 



are treated using the techniques developed in Section 5.3 



5.4-2 Whitened KL modes 

Using the pixel-by-pixel noise estimates from the previous section, we can now follow the 
formalism of §5.3| and construct the optimal orthonormal basis across the survey window 
defined by the selection function of the bright galaxies from the COSMOS survey. We use 
pixels that are two arcminutes on a side, in a grid of 40 x 41 = 1640 total pixels. We whiten 
the theoretical correlation matrix according to the noise properties of the observed data, 
and compute the eigenvalue decomposition of the resulting correlation matrix. 



The first nine eigenmodes for the bright galaxy sample are shown in Figure 5.4 It is 
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interesting to compare these to the modes shown in Figure 4.1, which are derived under 
the assumption that each pixel has the same number of sources, and thus the same noise 
properties. The window function of the COSMOS survey is clearly present, as can be seen 
by comparing the masking apparent in the eigenmodes to that of the galaxy distribution 



shown in the upper panel of Figure 5.1 Moreover, the asymmetry of the mask acts as a 
perturbation that destroys the rotational symmetry evident in the idealized eigenmodes of 



the previous chapter (See Figure 4.1). 



The eigenvalues of the whitened correlation matrix are shown in Figure 5.5 Because 
the covariance matrix is whitened, the noise is normalized to 1 within each mode. Similar 
to the results seen in the previous chapter, only a very small number of modes have signal- 
to- noise greater than 1. This figure also shows that the signal drops to zero at just over 
1500 modes. This is due to the survey mask: approximately 120 of the 1640 modes are 
completely masked, such that they have no signal and do not contribute to the correlation 
matrix. 

As discussed above, an advantage of KL is its ability to yield an optimal low-rank 
approximation of a set of observations, by truncating the low signal-to- noise modes in 
a reconstruction. The choice of which modes to truncate for a reconstruction or other 
analysis is not straightforward: as discussed in Chapters 3-4, this decision amounts to a 
tradeoff between systematic bias and statistical error. Below we impose a cutoff for modes 
with signal-to-noise ratios of less than ~ 1/10, corresponding to mode number 800. This 
lies approximately at the inflection point of the signal-to-noise curve. 



5.4-3 Is our shear Gaussian? 

The KL formalism for shear analysis assumes that the shear is well-described by a Gaussian 
random field described by a covariance matrix, with mean zero. If this is the case, then (by 
the arguments of Chapter 2) we would expect the observed KL coefficients of the whitened 
signal to be Gaussian distributed with zero mean and variance equal to the associated eigen- 



value. Figure 5.6 shows a histogram of the observed coefficients scaled by the corresponding 



eigenvalue. As is evident, both the real part and the imaginary part of the scaled coefficients 
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Figure 5.4: The first nine 2D KL signal-to-noise eigenmodes for the COSMOS bright 
objects. This uses square pixels that are two arcminutes on a side, leading to 41 x 40 = 1640 
pixels over the entire field. Compare these results to the KL modes shown in Figure 4.1 
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Figure 5.5: The distribution of KL eigenvalues for the eigenmodes shown in Figure 5.4 



There are 41 x 40 = 1640 pixels, but approximately 90 of these contain no sources and 
are part of the mask. This is reflected in the fact that the final 90 KL modes have zero 
eigenvalue. 
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real, n = 1544 




Figure 5.6: The histogram of normalized coefficients aj/yAi- If the shear is truly a Gaussian 
random field, this distribution should be a Gaussian with unit variance. 



are consistent with being drawn from a standard normal distribution. This is consistent 
with our assumption that the shear is drawn from a Gaussian random field, and that the 



noise properties estimated in £5.4.1 are accurate. 



5.4-4 Relationship to Power Spectrum 



As we did in Figure 4.2 for each KL mode we compute the associated two-dimensional 
power spectrum to determine the relationship between each mode and the Fourier power it 
represents. For the unmasked KL modes explored in the previous chapter, this relationship 
displayed a fairly tight scatter between KL mode number and Fourier mode number. As 



seen in Figure 5.7, however, we see that the masked KL modes have a much larger scatter 
in associated Fourier modes, especially for higher KL mode numbers. 

The analysis reflected in this plot can help in the choice of which KL modes to trun- 
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cate: the pixel scale is two arcmin, which corresponds to a dimensionally- averaged Nyquist 
frequency of 

1 \ / 2-7rrad \ 

7200. (5.22) 



\/2 J V 2 arcmin J 

So modes which do not have significant Fourier power on angular scales I < 7200 are likely 
to be limited in their usefulness for parameter estimation from shear on this grid. This 
scale does not tell the whole story, however. The KL analysis tells us that the smaller scales 
probed by the higher-order modes have progressively smaller signal-to-noise ratios. For this 
reason, we choose the mode cutoff at scales less than 7200, which corresponds to n ~ 800 
modes. As was the case in Chapter 4, the optimal choice of mode cutoff is hard to quantify 
precisely, and represents a fundamental tradeoff between statistical and systematic errors. 
Modes larger than our cutoff of 800 have an expected signal-to-noise of less than 1/10. 

5.5 Results 

The result of the KL-based Bayesian inference for cosmological parameter estimation is 



shown in Figure 5.8 To compute the eigenmodes, we assume a flat ACDM cosmology with 
JljVf = 0.27, Ql = 0.73, ag = 0.812, and h = 0.71. These KL modes are computed for the 
angular and redshift distribution of the bright galaxy sample, and the KL-based Bayesian 
inference is performed assuming a flat cosmology. We use only a single redshift bin in this 
case, which is comparable to the first analysis performed in S10. 

This leads to a best-fit cosmology £1 m = 0.23 ± 0.06, a% = 0.83 ± 0.09, where the error 
bars represent la deviations about the maximum a priori value. This does not capture the 
entire story, however, as there is a strong degeneracy between the parameters (note that 
WMAP data offers complementary constraints in this plane, and can be used to break this 
degeneracy: see S10). Following S10, we describe this degeneracy by computing a power-law 
fit to the posterior distribution, to find 

o- 8 (^Af/0.3) L03 = 0.88 ± 0.14. (5.23) 

This should be compared to the S10 result for the 2D analysis, cr 8 (Sl A //0.3) a62 = 0.62 ±0.11. 

Compared to S10, our results show a 50% broader constraint on as, as well as a stronger 
degeneracy between as and Qm (reflected in the exponent of the relation) . This discrepancy 
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Figure 5.7: The Fourier power represented by each KL mode. For each KL mode number, 
the vertical band shows the distribution of power with angular wavenumber i. In general, 
the larger KL modes correspond to larger values of £, though there is significant mode 
mixing. 
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is likely due to the fact that we use only the bright galaxies in this analysis, while the S10 
results use both bright and faint galaxies, as well as the fact that S10 marginalizes over 
nuisance parameters (the Hubble parameter and redshift systematic corrections) while we 
fix these at the expected values. 

S10 notes that the low value of as seen in their 2D analysis is likely an artifact of cosmic 
variance: the strongest contributions to lensing signals in COSMOS are from z > 0.7, which 
boosts the shear signal for higher redshift sources but leads to a lower signal at intermediate 
redshifts. Using the full 3D analysis, S10 is able to separate these regions, leading to results 
consistent with those from WMAP. 

Here we have limited the analysis to two dimensions, but this is by no means a funda- 
mental limitation of KL. As long as we can sufficiently estimate the correlation of signal and 
noise, KL can be used to analyze an arbitrary geometry: in future work we will extend the 
present analysis to three dimensions, fully taking into account the redshifts of the sources. 

5.6 Next Steps 

The above analysis presents a firm basis for further exploring the ability of KL to provide a 
natural basis for extracting two-point information from weak lensing surveys. Further work 
is needed to fully understand the effect of the mode truncation on results, as well as other 
effects such as the pixel size, the assumptions of noise, and the effect of the assumed fiducial 
cosmology. 

There are also some potentially interesting and useful features of the algorithm: first of 
all, the KL framework naturally extends from two dimensions to three. Unlike the rigidly 
tomographic approach used in conventional correlation function studies, KL allows each 
pixel to have its redshift distribution individually specified, potentially leading to a more 
robust use of source redshift information. For this reason, KL is a promising technique for 
a full 3D analysis, and will give insight into the density and perhaps equation of state of 
dark energy. 

Second, assumptions about noise and bias can be built-in to the KL model. For example, 
S10 does a careful job of correcting for the shape of the HST PSF before computing the 
observed correlation function. With KL, we could instead account for these biases in the KL 
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Figure 5.8: The posterior distribution in the erg) plane for a 2D analysis of the bright 
galaxy sample. This uses 800 of the 1640 modes, such that the truncated modes have 
average signal-to- noise ratios of <~ 1/10, and an approximate angular scale of I ~ 7000, 
which corresponds to 3 arcmin or 1.5 pixel- lengths. 
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basis itself, allowing us to perform our cosmological analysis one step closer to the observed 
data. 

Third, there is the question of how this approach can scale from the one square degree 
of COSMOS to the 20,000 square degrees of LSST. The LSST weak lensing analysis has 
potential to give very tight constraints on cosmological parameters, especially the possible 
evolution of dark energy. It will be increasingly important to address and explore how this 
KL framework can be scaled to the size of future surveys. 
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Chapter 6 
CONCLUSION 

In the above chapters, we have developed the Karhunen-Loeve analysis as a useful tool 
for several aspects of the analysis of present and future weak lensing surveys. In Chapter 2, 
we discussed the details of the KL formalism. We showed that KL is a powerful technique 
that enables data to be represented as a linear combination of orthogonal modes that are 
constructed such that the modes are optimal representations of the signal-to-noise ratio. 

In Chapter 3, we demonstrated that KL can be used to construct an optimal linear 
framework for the mapping of three dimensional structure from weak lensing surveys. The 
KL filtering leads to an algorithm which is orders of magnitude faster than previously 
studied approaches, and allows quantitative constraints on the effectiveness of mapping for 
given survey depths and geometries. 

In Chapter 4, we demonstrated that KL can be used to address a practical problem for 
two and three dimensional mass mapping: the interpolation of shear signal across masked 
regions of a given survey. The reconstruction takes into account theoretical expectations 
of the shear correlation, and results in peak counts that are more consistent with those of 
the underlying distribution. The KL approach also results in a natural filtration of low- 
magnitude noisy peaks, which has the potential to increase the performance of cosmological 
likelihood calculations from peak statistics of shear. 

In Chapter 5, we show how KL can be used directly as a tool to derive cosmological 
parameter constraints from two-point information within a Bayesian inference framework. 
Because KL can naturally account for arbitrary survey masks and geometries, it allows for 
robust determination of likelihoods without the need for computationally expensive cali- 
bration against N-body simulations. As a proof-of-concept, we perform a two-dimensional 
likelihood analysis to derive constraints on cr§ and CIm which are consistent with those 
derived from conventional correlation-function approaches using the same data. 
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In these three important areas of weak lensing analysis, the KL approach has proven 
valuable in addressing the practical problems associated with the science of weak lensing. 
KL's robust, computationally efficient approach has the potential to be very useful in many 
areas of future weak lensing science. 

This thesis opens nearly as many questions as it answers. In the future, we would like 
to explore more deeply how the shear KL basis can be used to address real problems in 
data analysis. Chapters 4 and 5 end with discussions of some remaining questions: can the 
KL analysis of projected shear peaks lead to robust cosmology constraints from realistic 
datasets? Can the KL power spectrum approach be extended to 3D and make use of our 
knowledge about the correlated statistical and systematic errors in real weak lensing data? 

Beyond that, we can ask other questions: is KL the best basis to use for these sorts of 
studies? In chapter 2, we show that KL gives the optimal low-rank reconstruction and noise 
filtering among all possible linear, orthonormal bases. This does not, however, exclude the 
possibility of using other non-standard representations of the data, such as non-orthonormal 
or overcomplete bases. Such approaches are common in the fields of sparse coding and 
compressed sensing, and may also be fruitful methods within the field of weak lensing. 
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Appendix A 

RANDOM FIELDS, CORRELATION FUNCTIONS, AND POWER 

SPECTRA 



In this appendix we discuss some of the details of the mathematics behind random fields, 
and their correlation functions and power spectra. In Appendix A. 2 we apply this to the 



cosmological density field introduced in Chapter 1, and from this define the power spectrum 
normalization as- Some common window functions and their Fourier transforms are listed 
in Appendix |A.3 



A.l Background on Gaussian random fields 

Consider a field g(x) in n dimensions. We'll enforce a few restrictions on this field to make 
it easier to work with. Note that (•) denotes a volume-average: 

1. vanishing: (g(x)) = for all x. 

2. homogeneous: g(x + y) is statistically equivalent to g(x) for all x and y. 

3. isotropic: g(Rx) is statistically equivalent to g{x) for all x and any unitary rotation 
matrix R. 

These conditions become very useful when we study the (auto) correlation function, defined 
as 

C gg (r) = (g(x)g*(x + r)) (A.l) 
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which for a homogeneous and isotropic field depends only on the distance r = \r\ . It becomes 
useful to decompose g into orthogonal Fourier components Q 

9{S) = J ^® e ~ AS ' (A ' 2) 



g(k) = J d n xg(x)e^ k (A.3) 
From these, we can see that the n-dimensional Dirac delta function can be written 

51{S ~ ^ = j^r I dnke ' iHs ~ s,) (A - 4) 

such that 

r d n xf(x )8%{x -x) = f(x") (A.5) 



We now define the Power Spectrum of g to be the Fourier transform of the auto- 
correlation function, which, due to isotropy, depends only on the magnitude of k: 



P g (k) = J d n xe-™ rk C gg (x) 



C 99 {x) = (2^/ d n ke^P g (k) (A.6) 

A bit of math shows that the Power Spectrum is proportional to the Fourier-space correlation 
function: 

C gg (k-k') = (g(k)g*(k')) = (27rr5 n D (k-k')P g (\k\). (A.7) 
Along with isotropy and homogeneity, this result implies 

P g (k) oc (\g(k)\ 2 ) ■ (A.8) 

The proportionality constant is finite only for a discrete Fourier series (i.e. a finite averaging 
volume) . 



Note that the Fourier transform convention in eqns A. 2 A.3 is useful in that it leads to a particularly 



simple form of the convolution theorem, without any gratuitous factors of \/2tt: 
h(x) = J d n x'f(x')g(x-x') h(k) = f(k)g(k) 
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A. 1.1 Smoothing of Gaussian Fields 



When measuring a realization of Gaussian field, we often make the measurement within a 
region defined by a window function W{x/R). By convention, we express window functions 
in terms of x/R, where R is a characteristic length scale of the window. This window may 
reflect a sharp boundary in space (e.g. a spherical tophat function) or perhaps an observation 
efficiency in space (e.g. a 3D Gaussian). In either case, our observed overdensity is given 
by 



9W 



(£) = J d n x'W 



X — X 



R 



g(x') 



(A.9) 



where W{x/R) is normalized such that 

d n xW(x/R) = 1. (A.10) 
This can be simplified if we can define the Fourier transform pair of a window function in 



the convention of equations A. 2 and A. 3 (cf. Liddle & Lyth 2000): 



W(kR) = I W(x/R)e ik -*d n x 



W{x/R) 



1 



W(kR)e- ik -*d n k 



(A.ll) 



This definition is convenient because, when combined with equation A. 2 some straightfor- 
ward algebra leads to the convolution theorem: 



g w {k) = W(kR)g(k) 



It is also useful to calculate the cross-correlation between two windows, 



(A.12) 



(g Wl (xi)gw2&)) = j d n xWi 



R 



d n x'W 2 



R 



(g(x)g*(x')) (A.13) 



Using equations A. 6 and A.ll we can re-express equation A.13 as a single integral over the 
wave number: 



(9Wi(xi)gw 2 {x 2 )) 



1 



(27T)' 



- / d n kP g {k)W l {kR)W 2 {kR)e tk - 



ik-(xi —X2) 



(A.14) 



Appendix |A.3 lists a few common window functions and their Fourier transforms. 
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A. 2 Cosmological Mass Power Spectrum 

In studies of the cosmological distribution of matter, we are interested in the comoving 
matter density p(x), which defines the mass density at every comoving point x in the 
universe. In order to take advantage of the preceding formalism, we can subtract the mean 
cosmological density p(x), and define a dimensionless density contrast S(x), such that 

= «1. ,A, 5 ) 

By the assumptions of the Cosmological Principle, for small deviations 5(x) is an isotropic, 
homogeneous random field. We can better understand the distribution of 5(x) by looking 
at the mean square deviation 



(\5(x)\ 2 ) = Css(0) 
1 



d 3 kP s (k) 



(2tt) ; 

^ J k 2 dkP 5 (k) (A.16) 



The power spectrum of density contrast given by equation A. 6 can be an inconvenient 



quantity to work with, because it has dimensions of volume. We can take the lead from 



equation A.16 and define a dimensionless form of the power spectrum 

k 3 



A 2 (*0 = ^2 p dk) (A.17) 



This is constructed so that equation A.16 can be written in a simple form: 



roc 

(\5(x)\ 2 )= / A 2 (£;)d(ln£;) (A.18) 
Jo 

This convention is due to Peacock (1999). For mathematical convenience, we'll continue to 
work with the P$(k) convention, with the understanding that we can switch back and forth 
any time using equation |A.17| 

A. 2.1 Power Spectrum Normalization 

In practice, the functional form of the power spectrum is determined only up to a propor- 
tionality constant, such that 

Ps(k) = PoP' s {k) (A.19) 



135 



where Pg(k) is the unnormalized form. For historical reasons, the normalization constant Po 
is commonly expressed in terms of the parameter as , which is defined as the mean density 
fluctuation within a sphere of radius 8 Mpc. To compute this, we use a top-hat window 
function: 

' 1, \x\/R < 1 
0, \x\/R>l 



W T (x/R) 



(A.20) 



The density fluctuation within this window is found using equation A. 14 

1 



4 



(2tt) s 



d 6 kP s {k)[W T (kR)Y 



(A.21) 



where the window function is assumed to be shallow enough that there is no cosmological 
evolution of the signal. 

For the top-hat window function of equation 



A.20 



with k = \k\, the Fourier transform 



of equation A.20 (cf. eqn. A. 11) is 



W T (kR) 



(kRf 



[sm(kR) - kRcos(kR)] . 



(A.22) 



as can be calculated using equation |A.21 and A.22 with R = 8Mpc for a given P$(k). The 



WMAP 7-year measurement gives as = 0.816 ± 0.024 (Komatsu et al. 2011). Using this 



value, the correct normalization can be computed for any functional form of the power 
spectrum. 



A. 2. 2 Window functions and Measurement Covariance 

In a 3D lensing analysis, we are searching for a signal within a series of windows defined as 



W ij (x) = W ij (^9)=q i (O-Fj(0) 



(A.23) 



where £ is the radial comoving distance, and 9 is the angular position on the sky. To convert 
between angle on the sky and projected comoving separation, we multiply by the transverse 
comoving distance (eqn. 16 in Hogg 1999), given by 

K-^simy/ 2 ^) (k>0) 

£ (« = 0) (A.24) 

(-k)-V2 s inh[(-K)i/2^ ( K <o) 
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Our observed over density in window Wu is given by 



d 3 xW ij (x)5(x) 



(A.25) 



Fj is the function describing the shape of the j pixel, while qi is the function describing 
the i th redshift bin. These window functions should be normalized as in equation A. 10, such 
that 

J dZ[f K (w)] 2 qi (0 = 1 (A.26) 

and 



J d 2 6Fj(9) = 1 



We are usually concerned with the covariance matrix of the signal, given by 

d 2 6F Jn [e] J d 2 e'F Jm [6'] 

This can be simplified using the Limber approximation. 
A. 2. 3 The Limber Approximation 

Consider a projection of the density field along a certain radial direction 

9i @) = j d£pi(0* 

The cross correlation is 



S, 



f) 



dfriio I de Pj (o(s[uoo^}s*[m'W,e. 



Let's express 5(x) in terms of the Fourier integral, equation A. 2 This gives 

d 3 k f d 3 k' 



s 9 , 9] {e-e') 



dip, 



x exp 



(0 / di'pM') 



(2vr) 3 J (2tt) 3 



exp 



if K (£)0 -k^+iKw' . (A.31) 



(A.27) 



(A.28) 



(A.29) 



(A.30) 
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Here k± is the 2-dimensional projection of k perpendicular to the line of sight, and fey is 
the projection of k along the line of sight. The second argument of S(k, £) parametrizes 
evolution with time via |c • dt\ = a ■ d£. 

Because the power spectrum P$(k) decreases linearly with k as k — > 0, there must be a 
coherence scale L c such that the correlation is near zero for |£ — £'| = A£ > L c . The first 
part of the Limber approximation is to assume that S gg vanishes at these distances. Next 
we make the assumption that Pi(£) and Pj(£') do not vary appreciably over the small range 
where S 9igj is non-vanishing, and that this range is small enough that ~ /«(£')• This 

allows us to rewrite the above expression in a simpler way: 

d 3 k f d 3 k' 



S, 



9i9j ' 



7) 



x exp 



(2tt) 3 J (2tt) 3 

if K (0 (0-k±- 6' ■ k' ± ) - iefc[| / dC' exp (^k\ ) (A.32) 



A.4 



The integral over £' is simply 27T^£>(A;|,) via equation 
function is proportional to Ps(k)5jj(k — k') via equation 



and the Fourier space correlation 



A.7 



dWOl'jtO I ^ / d 3 k%(k-k , )S D (k\ l )P l) (h ) 



x exp 



iUO (0 ■ k ± - 6' ■ k' ± ) -i£k\\ 



Carrying out the integrals over the two delta functions we see 

d 2 k 



S, 



9i9j \ U V 



dtiPi(0Pj(0 
dfiPi(0Pj(0 



Ps(k±,^)e^p -if K (Q0ij-k 



(27T) 2 " 
kdh 



(A.33) 

(A.34) 
(A.35) 



where we have defined 9 



9', and Jq(x) is a Bessel function of the first kind, which 



comes from the angular integral via 

J n (x) - 



1 

2^ 



-{(nr—xsinr) 



(A.36) 



We see that all fey terms have vanished, which leads to the main result of the Limber 
approximation: there is no correlation between the density contrast in windows that do not 



overlap in redshift. This is quickly seen from the leading integral in equation A.34 If the 
windows and Pj(£) do not overlap, then the expression integrates to zero. 
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Now that we have a simple expression for the 2-dimensional projected correlation func- 
tions, we can use equation A. 6 to define the 2-dimensional cross power spectrum of the 
measured covariance, 



P 



9i9j 



(A.37) 



Combining equations A. 34 and A.37 we have 



d 2 k ± 

(2*)* 



*fri®Pi(£) / d 2 k ± P s (k ± ,OS 2 D (£-m)kA 



(A.38) 



A. 2. 4 Applying the Limber Approximation 



Examining equation A. 28 we see that the integrals over £ and £' appear in equation A. 30 
with pi(£) -> 2 and Pj(€') ~> 9i m (£')[/« (0) 2 - Tnus we can rewrite equation 



A,28| using the Limber approximation: 



[S, 



551 



d 2 8F ln [8] / d 2 e'F Jm [e']s nm {e-e') 



(A.39) 



Now using the procedure from section |A.1.1[ we can write this approximation as an integral 



in Fourier space over the cross power spectrum given by equation A.38 We'll consider 
the simple case where each pixel has the same angular shape described by F(6), so that 
Fjn 0) = F{6- 6 n ) and F Jm 0) = F(6 - 6 m ). Defining 

1 



"n "hi i 



[S, 



S5\ 



(27T) 2 
1 

2^ 



d 2 £F jn $F jm (i)P nm (£y e - e ™ 
9 F jn (£)F jm (£)P nm (£)J (£e nm 



(A.40) 



where the second line holds if the window functions are circularly symmetric, with the 
Bessel function given by equation A. 36 The projected power spectrum P nm (£) is given by 
equation A.38 with appropriate substitution for pij: 



(A.41) 
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Note the factor of [/k(0] 2 m the numerator, which has its root in the spherical coordinate 
differential d 3 x — > r 2 dr ■ dfl. 

Our main application of this formalism will involve discrete non-overlapping redshift 
bins with uniform weighting. That is, 

,(0 = < ^ ^ n<e< ^ ax (A.42) 
0, otherwise 



with the normalization constant computed via the condition in equation A. 26 



A 



■ w 



Smax 



-i 



(A.43) 



To summarize, the correlation between two windows n and m becomes, with i n ^ m indexing 
the redshift window, and n, m indexing the angular window, 

[Sss]nm = 5 inim U)(\6 nm \) 

1 r°° ~ 
= s- / m\F(i)\ 2 p nm (e)j (ie) 

^ Jo 

j Smax 



.<(*) 

/ smax 

^4 = n ^[UO? (A.44) 



where bf- is the Kronecker delta. This result should be compared to equations 39-41 of 



Simon 



et al. (2009). The only difference is that we have correctly accounted for the normalization 



of the redshift bin. Note that if we assume is constant across each redshift bin, the 

two formulations are equivalent. 

For our analysis, we will use angular pixels with radius 9 S , so that the Fourier transform of 



the window function is given by equation A. 51 and the equation giving the signal covariance 
becomes 

[S6s] nm = J|| d ^ ln f^ ] J J p s [ J i(^)] 2j °(^) (A. 45) 

A. 3 Window Functions and their Fourier Transforms 



Here we list a few common window functions and their Fourier transforms 
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A. 3.1 Gaussian Window Functions 

The n-dimensional Gaussian window function is denned as 



! /-If 12 



^'(^^Iwj (A - 46) 

The Fourier transform is straightforward because the dimensions decouple and we're left 
with n 1-dimensional Gaussian integrals. The resulting window function is 

W(kR) = exp ^ -l^ 2 ^ (A>47) 

which itself is a Gaussian. 



A. 3. 2 Top-hat Window Functions 

The n-dimensional tophat window function is given by 



, 1, \x\ < R 

W(x/R) = A n x { (A.48) 
0, |f| > R 



A, = T -m^ (A.49) 



with 

A„ = l 

(Ry/iF) r 

where T(y) is the gamma function. The normalization is simply the inverse of the volume of 
an n-sphere of radius R. For n = 2 and n = 3, the normalizations are the familiar l/{itR 2 ) 
and 3/(47ri? 3 ), respectively. For the top-hat window function, there is no simple expression 
for the Fourier transform for arbitrary n. Here we compute three special cases: 



n = 1: 



A 1 = (2R)- 1 

, rR 

W(kR) = A x / dre lkr 

J-R 

(A.50) 



-R 

sin(kR) 
kR 
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n = 2: 



n = 3: 



iy(/c-R) = A2 rdr / expfz&r cos 4>]d(f> 
Jo Jo 



R 2 jo 

2J 1 (kR) 
kR 



r Jo(/cr)dr 



(A.51) 



A, 



W{kR) 







(4^ 3 /3) _1 

/*7T pill 

As / r 2 dr / sin(#)d# / exp[z/crcos 
Wo 

1 







- 1, 2 , 



r Jo(kr)dr 
5 -fc 2 i? 2 . 



(A.52) 



where, J n (x) are Bessel functions of the first kind (see eqn A. 36) and the last line is a 



generalized hypergeometric function, n F m {a,Q ■ ■ ■ a n ; bo ■ ■ ■ b m ; x). 
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Appendix B 

EFFICIENT IMPLEMENTATION OF THE SVD ESTIMATOR 



As noted in Section 



3.2.2 



' — " —1/2 

taking the SVD of the transformation matrix = A/^-y M, 



yd 



is not trivial for large fields. This appendix will first give a rough outline of the form of 
Myg, then describe our tensor decomposition method which enables quick calculation of 
the singular value decomposition. For a more thorough review of the lensing results, see 



e.g. Bartelmann & Schneider (2001). 



Our goal is to speed the computation of the SVD by writing M^s as a tensor product 
A<S>B. Here "<g>" is the Kronecker product, defined such that, if A is a matrix of size nxm, 
B is a matrix of arbitrary size, 

' A n B A 12 B ■■■ A lm B ^ 

A 21 B A 22 B ■■■ A 2m B 
A(g>B= _ _ _ (B.l) 

\ A n \B A n2 B ■ ■ ■ A nm B J 
In this case, the singular value decomposition A ® B = Uab^abVab satisfies 

Uab = U A ®U B 
^AB = <8> ^B 

V AB = V A ®V B (B.2) 

where C/^S^V^ is the SVD of A, and C/bSbV^ is the SVD of B. Decomposing M 7 ^ in this 
way can greatly speed the SVD computation. 

B.0.3 Angular and Line-of-Sight Transformations 

The transformation from shear to density, encoded in M~g, consists of two steps: an an- 
gular integral relating shear 7 to convergence k, and a line-of-sight integral relating the 
convergence k to the density contrast 5. 
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The relationship between 7 and k is a convolution over all angular scales, 

7(0, z a ) = lx + i l2 = J dV v(e' - o)k{o', z s ), 



(B.3) 



where T>(0) is the Kaiser-Squires kernel (Kaiser &; Squires, 1993). This has a particularly 
simple form in Fourier space: 



z 8 ) = 1 + i ^ k(£,z s ). 



(B.4) 



?i - tf 2 

where 7 and k are the Fourier transforms of 7 and k and ^ = (£1,^2) is the angular wavenum- 
ber. 

The relationship between k and 5 is an integral along each line of sight: 

(B.5) 



k(0,z.) = / dz W(z,z s )6(0,z) 
Jo 



where W(z, z s ) is the lensing efficiency function at redshift z for a source located at redshift 
z s (refer to STH09 for the form of this function). 



Upon discretization of the quantities 7, k, and 5 (described in Section 3.2.1 ), the integrals 



in Equations B.3 B.5 become matrix operations. The relationship between the data vectors 
7 and k can be written 

7= [P 7K ® l s ] K + n 7 (B.6) 
where l s is the -/V s x N s identity matrix and P^ K is the matrix representing the linear 



transformation in Equations B.3 B.4 The quantity [Pj K <S> l s ] simply denotes that P 



i k 



operates on each of the N s source-planes represented within the vector k. Similarly, the 
relationship between the vectors k and d can be written 



K = [1- 



(B.7) 



where l xy is the N xy x N xy identity matrix, and the tensor product signifies that the 
operator Q K $ operates on each of the N xy lines-of-sight in S. Q K $ is the N s x N\ matrix 



which represents the discretized version of equation B.5 Combining these representations 



allows us to decompose the matrix M^s m Equation 3.1 into a tensor product: 



M 



7<5 



7ft 



QkS- 



(B.8) 
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B.0-4 Tensor Decomposition of the Transformation 

We now make an approximation that the noise covariance A/" 77 can be written as a tensor 
product between its angular part A/p and its line of sight part A/q: 

M ry =Mp®M Q . (B.9) 

Because shear measurement error comes primarily from shot noise, this approximation is 
equivalent to the statement that source galaxies are drawn from a single redshift distribution, 
with a different normalization along each line-of-sight. For realistic data, this approximation 
will break down as the size of the pixels becomes very small. We will assume here for 
simplicity that the noise covariance is diagonal, but the following results can be generalized 
for non-diagonal noise. Using this noise covariance approximation, we can compute the 
SVDs of the components of M 7 s- 

Up-EpV^p = N P 1/2 P 1K 

UqEqV ] q =M q 1,2 Q kS (B.10) 
In practice the SVD of the matrix Py K need not be computed explicitly. P^ K encodes 



the discrete linear operation expressed by Equations |B.3pT4| as pointed out by STH09, in 
the large-field limit P 7K can be equivalently computed in either real or Fourier space. Thus 
to operate with P 7K on a shear vector, we first take the 2D Fast Fourier Transform (FFT) 
of each source-plane, multiply by the kernel [l\ + H2)/ —i&2), then take the inverse FFT 
of the result. This is orders-of- magnitude faster than a discrete implementation of the real- 
space convolution. Furthermore, the conjugate transpose of this operation can be computed 
by transforming t — > — £*, so that 

P\ K P JK = I (B.ll) 

and we see that P 7K is unitary in the wide-field limit. This fact, along with the tensor 
product properties of the SVD, allows us to write M 1 $ = ITEV^ where 



U « l xy U Q 

W » P 1K ®V^ (B.12) 
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The only explicit SVD we need to calculate is that of Mq Q k si which is trivial in cases of 
interest. The two approximations we have made are the applicability of the Fourier-space 



form of the 7 — > k mapping (Eqn. B.4), and the tensor decomposition of the noise covariance 
(Eqn.[R9l). 
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Appendix C 
CHOICE OF KL PARAMETERS 



The KL analysis outlined in Section 4.2 has only two free parameters: the number of 
modes n and the Wiener filtering level a. Each of these parameters involves a trade-off: 
using more modes increases the amount of information used in the reconstruction, but at 
the expense of a decreased signal-to-noise ratio. Decreasing the value of a to reduces the 
smoothing effect of the prior, but can lead to a nearly singular convolution matrix Mf n>a \, 
which results in unrealistically large shear values in the poorly-constrained areas areas of 
the map (i.e. masked regions). 

To inform our choice of the number of modes n, we recall the trend of spatial scale 



with mode number seen in Figure 4.2 Our purpose in using KL is to allow interpolation 
in masked regions. To this end, the angular scale of the mask should inform the choice of 
angular scale of the largest mode used. An eigenmode which probes scales much smaller than 
the size of a masked region will not contribute meaningful information to the reconstruction 
within that masked region. Considering the pixels within our mask, we find that 99.5% of 
masked pixels are within 2 pixels of a shear measurement. This corresponds to an angular 
scale of I = 6140. Consulting Figure |4.2[ we see that modes larger than about n = 900 out 
of 4096 will probe length scales significantly smaller than the mask scale. Thus, we choose 
n = 900 as an appropriate cutoff for our reconstructions. 

To inform our choice of the Wiener filtering level a, we examine the agreement be- 
tween histograms of M ap peaks for a noise-only DES field with and without masking (see 



Section 4.4). We find that for large (small) values of a, the number of high-M ap peaks 
is underestimated (overestimated) in the masked compared to the unmasked case. 

Empirically, we find that the two agree at a = 0.15; we choose this value for our analysis. 
Note that this tuning is done on noise-only reconstructions, which can be generated for 
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observed data by assuming that shape noise dominates: 

[Sfjij = (C.l) 

The a-tuning can thus be performed on artificial noise realizations which match the observed 
survey characteristics. 

We make no claim that (n, a) = (900,0.15) is the optimal choice of free parameters 
for KL: determining this would involve a more in-depth analysis. They are simply well- 
motivated choices which we use to make a case for further study. 
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